Single Cell Genomic Sequencing Using Hydrogel Based Droplets

ABSTRACT

The present disclosure provides ultrahigh-throughput single cell genomic sequencing methods, referred to herein as “SiC-seq”, which methods include encapsulating single cells in molten gel droplets to facilitate bulk cell lysis and purification of genomic DNA in microgels. Systems and devices for practicing the subject methods are also provided.

CROSS-REFERENCE

This application claims the benefit of U.S. Provisional Patent Application No. 62/437,605 filed Dec. 21, 2016, which application is incorporated herein by reference in its entirety.

GOVERNMENT SUPPORT

This invention was made with government support under grant nos. AR068129, R01 EB019453 and R21 HG007233 awarded by the National Institutes of Health; grant no. 1253293 awarded by the National Science Foundation; grant no. HR0011-12-C-0065 awarded by the Department of Defense, Defense Advanced Research Projects Agency; and grant no. N66001-12-C-4211 awarded by the Space and Naval Warfare Systems Center. The government has certain rights in the invention.

INTRODUCTION

A common challenge when applying single cell sequencing to heterogeneous systems is that they often contain massive numbers of cells: A centimeter-sized tumor can contain hundreds of millions of mutated cancer cells, while a milliliter of sea water can contain millions of microbes. Moreover, each cell has a tiny quantity of DNA, making it challenging to accurately amplify and sequence so many single cells. Methods based on optical tweezers, flow cytometry, microfluidics, gel encapsulation, and virtual microfluidics can isolate and process hundreds of single cells for sequencing, but this constitutes a minute fraction of most communities. The sparseness of the sampling limits the questions that can be addressed, with the majority of findings relating to the most abundant subpopulations. For example, common environmental communities contain >1500 taxa, with rare taxa present at <0.1%, most of which are missed by single cell sequencing; indeed, the difficulty of capturing these cells is the basis of “microbial dark matter”—the overwhelming abundance of species thought to exist, but that have never been characterized. Therefore, a method that could markedly increase the number of cells sequenced through single cell sequencing would impact a broad range of problems across biology where heterogeneity is important.

SUMMARY

The present disclosure provides ultrahigh-throughput single cell genomic sequencing methods, referred to herein as “SiC-seq”, which methods include encapsulating single cells in molten gel droplets to facilitate bulk cell lysis and purification of genomic DNA in microgels. Systems and devices for practicing the subject methods are also provided.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention may be best understood from the following detailed description when read in conjunction with the accompanying drawings. Included in the drawings are the following figures:

FIG. 1 provides schematics of microfluidic devices according to one embodiment of the present disclosure used to a) generate barcode droplets and encapsulate cells in microgels; b) re-encapsulate gels with tagmentation reagents; and c) merge gel droplets with barcode droplets and PCR droplets.

FIG. 2 provides a schematic of an example SiC-seq workflow according to one embodiment of the present disclosure. Single cells are encapsulated in microgels and the genomes purified and fragmented, e.g., in a series of detergent and enzyme washes. The genomic fragments are then labeled with nucleic acid, e.g., DNA, barcode sequences unique to each droplet. The resulting barcoded genomic fragments are pooled and sequenced, generating reads that can be grouped by single cells based on shared barcode. The groups of reads comprise a database of low coverage genomes of single cells, which can be analyzed, e.g., using in silico cytometry.

FIGS. 3A-3C provide schematics of microfluidic and biochemical workflow to generate a SiC-seq library according to one embodiment of the present disclosure. FIG. 3A) Barcode droplets are generated by encapsulating random DNA oligos at limiting dilution with PCR reagents using a flow focus droplet maker. The droplets are thermal cycled, yielding a droplet containing clonal population of a unique barcode sequence for every ˜9 empty droplets (SYBR stained for visualization). FIG. 3B) Cells, e.g., bacteria, are encapsulated at limiting dilution with molten gel, e.g., molten agarose, to generate single cell containing microgels, e.g., single cell containing agarose microgels. The single cell genomes are purified through a series of bulk washes, e.g., detergent and enzyme washes. The purified single cell genomes are then re-encapsulated and labeled, e.g., tagmented. FIG. 3C) The labeled, e.g., tagmented, genome-containing microgels are merged with droplets containing barcode and nucleic acid amplification reagents, e.g., PCR reagents. During thermal cycling the barcodes splice onto the labeled, e.g., tagmented, genome fragments, generating chimeric molecules consisting of the barcode attached to a random fragment of the cell genome ready for massively parallel sequencing.

FIG. 4 depicts microscope images and plots characterizing the diffusion of genomic fragments inside agarose microgels. a) SYBR staining was used to monitor diffusion of genomes in microgels before and after tagmentation. b) After two days at room temperature, the microgels were pelleted by centrifugation and DNA was extracted from the microgels and the supernatant and quantified using the Qubit dsDNA high sensitivity assay and bioanalyzer high sensitivity DNA chip. The shift in fragment size was relatively minor as a result of the relatively low stoichiometric ratio of transposase to genome used. c) Encapsulated genomes were reacted with a higher stoichiometric ratio of transposase to genome and were visualized on a bioanalyzer high sensitivity chip to show fragmentation efficiency of the gel-encapsulated genomes.

FIGS. 5A-5E depict plots demonstrating the performance of SiC-seq on an artificially constructed microbial community. FIG. 5A) Distribution of reads in each barcode group. FIG. 5B) Histogram of the purity of each barcode group, which is defined as the fraction of reads mapping to the most mapped species for that group. FIG. 5C) Relative abundance estimates of each species are calculated using from left to right for each species: reads classified using Bowtie2 (Bowtie2 Reads), barcodes classified using Bowtie2 (Bowtie2 Barcodes), and reads classified using Kraken (Kraken reads). FIG. 5D) Relative coverage distribution for reads aggregated from all barcode groups for each microbe. FIG. 5E) Coverage histogram binned by relative coverage. See FIGS. 6A and 6B for coverage maps of other species.

FIGS. 6A and 6B depict the distribution of SiC-seq reads obtained from sequencing the mixed community of known microbes. FIG. 6A) Aggregate Coverage over Bacillus subtilis and Saccharomyces cerevisiae reference genomes in the SiC-seq validation dataset for 10 kb bins. FIG. 6B) Mapping positions of reads for randomly chosen barcode Staphylococcus groups with >2000 reads.

FIG. 7 depicts a schematic illustrating an example framework of a SiC-Reads database. Reads which contain properties such as sequence, read ID, and taxonomy are stored inside barcode groups which contain properties such as purity and taxonomy. Sequences shown are for example purposes only and are not limiting.

FIG. 8 illustrates the marine microbial community used to demonstrate in silico cytometry as described in the Examples herein. a) Taxonomic abundance of the SiC-Reads database by barcode groups. b) Distribution of purity of barcode groups in the database at the genus level.

FIGS. 9A-9C illustrate the application of SiC-seq to a marine community recovered from the San Francisco coastline. FIG. 9A) Distribution of antibiotic resistance (AR) genes according to genus of host microbe. Using in silico cytometry, the association of AR genes with the taxonomic classification is deduced. The opacity of connecting lines reflects the number of interactions detected in the database. FIG. 9B) Relative abundance of virulence factors in each genus detected in the community. The virulence ratio is the ratio between the number of barcode groups observed with virulence factors and the number of total barcode groups for that species, normalized to a scale from 0 to 1. FIG. 9C) Relative potential for transduction between bacterial taxa in a community plotted as a heat map.

FIG. 10 depicts the reference data obtained by simulating reads from genomic sequences of isolated strains for comparison against data in the marine microbial community as described in the Examples herein. a) Antibiotics resistance network for whole genome sequenced strains in public databases. b) Virulence factor ratios calculated for publically available strains.

FIG. 11 depicts plots showing the average and distribution of genome coverage of each barcode group plotted as a Lorenz curve for each species.

FIG. 12 depicts a plot showing the genome size-normalized purity scores of barcode groups in the 10-cell control experiment. Genome size-normalized purity scores are calculated using the same method using the fraction of the genome sequenced for each respective species rather than the raw number of reads.

FIG. 13 depicts plots showing the purity scores of barcode groups separately plotted for each species.

FIG. 14 depicts plots showing the purity scores of the next-most abundant species in a) barcode groups of purity <80%, b) barcode groups of purity >80%. In barcode groups with <80% purity, the purity scores of the next-most abundant species tend to be high from ˜20% to 50%, reflecting that those two species represent the majority of the reads in the barcode group, suggesting that these barcode groups represent double encapsulations. Barcode groups with 100% purity are not represented in the plots.

FIG. 15 depicts a plot showing SiC-seq performance on an artificially constructed microbial community consisting of Staphylococcus, Bacillus, and Saccharomyces. Relative abundance estimates of each species are calculated using from left to right for each species: marker gene counting without barcodes (Metaphlan), barcode counting (Barcode), manual counting under the microscope after cell encapsulation (Microscope count), and while in culture (Theoretical).

FIG. 16 depicts plots showing the aggregate genomic coverage of all the barcode groups for species in the synthetic microbial community. Species at low abundance show frequent dropouts characterized by dips in the graph, but instances of systematic bias characterized by sharp peaks are rarely observed.

FIG. 17 provides a schematic depicting a workflow for barcoding genomic DNA using asymmetric digital droplet PCR barcodes fused to MALBAC amplicons using a single-cycle extension.

FIG. 18 provides a schematic depicting a workflow for barcoding genomic DNA using symmetric digital droplet PCR barcodes fused to MALBAC amplicons using an overlap extension followed by multiple rounds of PCR.

FIG. 19 provides a schematic depicting a molecular barcoding scheme employing barcoded MALBAC primers producing a combinatorically-barcoded looped amplicon.

FIG. 20 provides a schematic depicting a microfluidic workflow for generating combinatorically barcoded genomic DNA amplicons.

FIG. 21 provides a schematic of a microfluidic device for generation of 30 μm water-in-oil emulsions containing a mixture of two aqueous phases. For digital barcode droplet generation, the device is operated with Inlet 1 plugged.

FIG. 22 provides a schematic of a microfluidic device for merger of a MALBAC-amplified cell droplet with a digital PCR barcode droplet. The shaded rectangle indicates the merger region where droplets subjected to a high electric field gradient merge.

FIG. 23 provides a representative electropherogram of DNA products resulting from an exemplary MALBAC barcode fusion reaction (before size-selection).

FIG. 24 provides a representative electropherogram of DNA products resulting from an exemplary MALBAC barcode fusion reaction (after size-selection).

FIG. 25 depicts a histogram displaying the frequencies of barcode group purities for all barcode groups in Example 2. The inset is plotted on a log-scale. The average purity of all barcode groups in the experiment is 0.950 (min. group size of 50 reads).

FIG. 26 depicts a scatter plot of barcode group purity vs. the number of reads in a barcode group for Example 2. Each point represents a barcode group. Only barcode groups with a minimum of 500 reads are shown.

FIG. 27 depicts a genome-wide coverage map for a representative barcode group (species: Bacillus subtilis). A dot is placed at all positions along the Bacillus subtilis genome where reads from the barcode group align.

DETAILED DESCRIPTION

The present disclosure provides methods of sequencing single cell genomic DNA for analyzing, e.g., metagenomes, copy number variants, and the genetic profile of complex biological samples. The methods described herein facilitate high-throughput processing of populations of single cells and subsequent sequencing of genomic DNA. Systems and devices for practicing the subject methods are also provided.

Before the present invention is described in greater detail, it is to be understood that this invention is not limited to the particular embodiments described, as such may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.

Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, some potential and exemplary methods and materials may now be described. Any and all publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. It is understood that the present disclosure supersedes any disclosure of an incorporated publication to the extent there is a contradiction.

It must be noted that as used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise. Thus, for example, reference to “a droplet” includes a plurality of such droplets unless the context clearly dictates otherwise.

It is further noted that the claims may be drafted to exclude any element, e.g., any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely”, “only” and the like in connection with the recitation of claim elements, or the use of a “negative” limitation.

The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Further, the dates of publication provided may be different from the actual publication dates which may need to be independently confirmed. To the extent the disclosure or the definition or usage of any term herein conflicts with the disclosure or the definition or usage of any term in an application or publication incorporated by reference herein, the instant application shall control.

As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. Any recited method can be carried out in the order of events recited or in any other order which is logically possible.

The terms “nucleic acid barcode sequence”, “nucleic acid barcode”, “barcode”, and the like as used herein refer to a nucleic acid having a sequence which can be used to identify and/or distinguish one or more first molecules to which the nucleic acid barcode is conjugated from one or more second molecules. Nucleic acid barcode sequences are typically short, e.g., about 5 to 20 bases in length, and may be conjugated to one or more target molecules of interest or amplification products thereof. Nucleic acid barcode sequences may be single or double stranded.

The terms “nucleic acid”, “nucleic acid molecule”, “oligonucleotide” and “polynucleotide” are used interchangeably and refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. The terms encompass, e.g., DNA, RNA and modified forms thereof. Polynucleotides may have any three-dimensional structure, and may perform any function, known or unknown. Non-limiting examples of polynucleotides include a gene, a gene fragment, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, control regions, isolated RNA of any sequence, nucleic acid probes, and primers. The nucleic acid molecule may be linear or circular.

The term “nucleic acid sequence” or “oligonucleotide sequence” refers to a contiguous string of nucleotide bases and in particular contexts also refers to the particular placement of nucleotide bases in relation to each other as they appear in a oligonucleotide. Similarly, the term “polypeptide sequence” or “amino acid sequence” refers to a contiguous string of amino acids and in particular contexts also refers to the particular placement of amino acids in relation to each other as they appear in a polypeptide.

As used herein the term “isolated,” when used in the context of an isolated cell, refers to a cell of interest that is in an environment different from that in which the cell naturally occurs. “Isolated” is meant to include cells that are within samples that are substantially enriched for the cell of interest and/or in which the cell of interest is partially or substantially purified.

The terms “droplets”, “droplet” and the like are used herein to refer to emulsion-based compartments capable of encapsulating and/or containing one or more single cells as described herein and/or one or more barcodes as described herein. Droplets may include a first fluid phase, e.g., an aqueous phase (e.g., water or hydrogel), bounded by a second fluid phase (e.g., oil) which is immiscible with the first fluid phase. In some embodiments, the second fluid phase will be an immiscible phase carrier fluid. Thus droplets according to the present disclosure may be provided as aqueous-in-oil emulsions. The term “droplet” is also used herein in the context of “solidified microgel droplets” to refer to a molten gel containing droplet or droplets, wherein the molten gel has been solidified leaving a solidified microgel surrounded by an immiscible phase film, e.g., an oil film. Droplets as used or generated in connection with the subject methods, devices, and/or systems may be sphere shaped or they may have any other suitable shape, e.g., an ovular or oblong shape. Droplets as described herein may include a liquid phase and/or a solid phase material. In some embodiments, droplets according to the present disclosure include a gel material. In some embodiments, the subject droplets have a dimension, e.g., a diameter, of or about 1.0 μm to 1000 μm, inclusive, such as 1.0 μm to 750 μm, 1.0 μm to 500 μm, 1.0 μm to 100 μm, 1.0 μm to 10 μm, or 1.0 μm to 5 μm, inclusive. In some embodiments, droplets as described herein have a dimension, e.g., diameter, of or about 1.0 μm to 5 μm, 5 μm to 10 μm, 10 μm to 100 μm, 100 μm to 500 μm, 500 μm to 750 μm, or 750 μm to 1000 μm, inclusive. Furthermore, in some embodiments, droplets as described herein have a volume ranging from about 1 fL to 1 nL, inclusive, such as from 1 fL to 100 pL, 1 fL to 10 pL, 1 fL to 1 pL, 1 fL to 100 fL, or 1 fL to 10 fL, inclusive. In some embodiments, droplets as described herein have a volume of 1 fL to 10 fL, 10 fL to 100 fL, 100 fL to 1 pL, 1 pL to 10 pL, 10 pL to 100 pL or 100 pL to 1 nL, inclusive. In addition, droplets as described herein may have a size and/or shape such that they may be produced in, on, or by a microfluidic device and/or flowed from or applied by a microfluidic device.

As used herein, the term “carrier fluid” refers to a fluid configured or selected to contain one or more droplets, as described herein. A carrier fluid may include one or more substances and may have one or more properties, e.g., viscosity, which allows it to be flowed through a microfluidic device or a portion thereof. In some embodiments, carrier fluids include, for example: oil or water, and may be in a liquid or gas phase.

As used in the claims, the term “comprising”, which is synonymous with “including”, “containing”, and “characterized by”, is inclusive or open-ended and does not exclude additional, unrecited elements and/or method steps. “Comprising” is a term of art that means that the named elements and/or steps are present, but that other elements and/or steps can be added and still fall within the scope of the relevant subject matter.

As used herein, the phrase “consisting of” excludes any element, step, and/or ingredient not specifically recited. For example, when the phrase “consists of” appears in a clause of the body of a claim, rather than immediately following the preamble, it limits only the element set forth in that clause; other elements are not excluded from the claim as a whole.

As used herein, the phrase “consisting essentially of” limits the scope of the related disclosure or claim to the specified materials and/or steps, plus those that do not materially affect the basic and novel characteristic(s) of the disclosed and/or claimed subject matter.

With respect to the terms “comprising”, “consisting essentially of”, and “consisting of”, where one of these three terms is used herein, the presently disclosed subject matter can include the use of either of the other two terms.

Methods

As summarized above, the present disclosure provides ultrahigh-throughput single cell genomic sequencing methods, referred to herein as SiC-seq, which methods include encapsulating single cells in molten gel droplets to facilitate bulk cell lysis and purification of genomic DNA in microgels. These methods facilitate the sequencing of single cell genomic DNA for analyzing, e.g., metagenomes, copy number variants, and the genetic profile of complex biological samples.

Methods for Sequencing Single Cell Genomic DNA

The present disclosure provides methods for sequencing single cell genomic DNA. Methods of the present disclosure provide ultrahigh-throughput sequencing of single cell genomes. In some embodiments, droplet microfluidics is used to isolate, fragment, and barcode the single cell genomes of a population of cells, allowing single cell genomic DNA to be recovered by grouping reads by barcode.

Isolation of Single Cells:

Using the methods described herein, single cells can be isolated in droplets. In some embodiments, single cells are encapsulated in droplets which may facilitate the process of purifying genomic DNA, without mixing the genomic contents of individual single cells. In some embodiments, encapsulating single cells in droplets is achieved using a microfluidic device that comprises a droplet generator. For example, a population of single cells may be flowed through a channel of a microfluidic device, the microfluidic device including a droplet generator in fluid communication with the channel, under conditions sufficient to effect inertial ordering of the cells in the channel, thereby providing periodic injection of the cells into the droplet generator to encapsulate single cells in individual droplets. In some embodiments, the method of encapsulating single cells in droplets comprises the addition of an immiscible phase fluid, e.g., oil, to generate an emulsion of droplets each containing a single cell. Additional description of cell encapsulation using microfluidic droplet generators is found, e.g., in U.S. Patent Application Publication No. 20150232942, the disclosure of which is incorporated by reference herein in its entirety.

In some embodiments, a droplet in which a single cell is encapsulated comprises a polymeric material. For example, suitable polymeric materials may include interpenetrating polymer networks (IPNs); a synthetic hydrogel; a semi-interpenetrating polymer network (sIPN); a thermoresponsive polymer; and the like. For example, in some embodiments, a suitable polymer comprises a co-polymer of polyacrylamide and poly(ethylene glycol) (PEG). In some embodiments, to suitable polymer comprises a co-polymer of polyacrylamide and PEG, and further comprises acrylic acid.

In some embodiments, a droplet in which a single cell is encapsulated may be a microgel droplet. In such embodiments, a microgel droplet may be a hydrogel droplet comprising a hydrogel polymer. Suitable hydrogel polymers may include, but are not limited to the following: acetic acid, glycolic acid, acrylic acid, 1-hydroxyethyl methacrylate (HEMA), ethyl methacrylate (EMA), propylene glycol methacrylate (PEMA), acrylamide (AAM), N-vinylpyrrolidone, methyl methacrylate (MMA), glycidyl methacrylate (GDMA), glycol methacrylate (GMA), ethylene glycol, fumaric acid, and the like. Some hydrogel polymers require the use of a cross linking agent. Common cross linking agents include tetraethylene glycol dimethacrylate (TEGDMA) and N,N′-methylenebisacrylamide. The hydrogel droplets can be homopolymeric, or can comprise co-polymers of two or more of the aforementioned polymers. Exemplary hydrogel droplets include, but are not limited to, a copolymer of poly(ethylene oxide) (PEO) and poly(propylene oxide) (PPO); Pluronic™ F-127 (a difunctional block copolymer of PEO and PPO of the nominal formula EO₁₀₀-PO₆₅-EO₁₀₀, where EO is ethylene oxide and PO is propylene oxide); poloxamer 407 (a tri-block copolymer consisting of a central block of poly(propylene glycol) flanked by two hydrophilic blocks of poly(ethylene glycol)); a poly(ethylene oxide)-poly(propylene oxide)-poly(ethylene oxide) co-polymer with a nominal molecular weight of 12,500 Daltons and a PEO:PPO ratio of 2:1); a poly(N-isopropylacrylamide)-base hydrogel (a PNIPAAm-based hydrogel); a PNIPAAm-acrylic acid co-polymer (PNIPAAm-co-AAc); poly(2-hydroxyethyl methacrylate); poly(vinyl pyrrolidone); and the like.

Of particular use in methods described herein are microgel droplets that are able to transform from one state to another, e.g., from a liquid state to a solid state. In some embodiments, a microgel droplet is a hydrogel droplet comprising a hydrogel polymer, wherein the hydrogel polymer is a thermoresponsive polymer. A thermoresponsive polymer generally exhibits a change in its physical properties with temperature. For example, a thermoresponsive polymer may exhibit a volume phase transition at a certain temperature, which causes a sudden change in the solvation state. Thermoresponsive polymers suitable for use in methods of the present disclosure may include those that become soluble upon heating. For example, agarose, e.g., a low gelling temperature agarose, can be suitable for use in the methods described herein. In some embodiments, a suitable thermoresponsive polymer, e.g., a suitable agarose, has a gel point of from about 20° C. to about 40° C., e.g., from about 25° C. to about 35° C., e.g., about 30° C. A thermoresponsive polymer may have a gel point that is distinct from its melting point. As used herein, the term “gel point” of a thermoresponsive polymer refers to the temperature at which a liquid thermoresponsive polymer solidifies, e.g., transitions into a solid state. As used herein, the term “melting point” or “melting temperature” of a thermoresponsive polymer refers to the temperature at which a solid thermoresponsive polymer melts, e.g., transitions into a liquid state. In some embodiments, a suitable thermoresponsive polymer, e.g., a suitable agarose, has a gel point of from about 5° C. to about 45° C., e.g., from about 5° C. to about 10° C., from about 10° C. to about 15° C., from about 15° C. to about 20° C., from about 25° C. to about 30° C., from about 30° C. to about 35° C., from about 35° C. to about 40° C., from about 40° C. to about 45° C., e.g., about 20° C. In some embodiments, a suitable thermoresponsive polymer, e.g., a suitable agarose, has a gel point of about 20° C.

In some embodiments, a suitable thermoresponsive polymer, e.g., a suitable agarose, has a melting point of from about 60° C. to about 95° C., e.g., from about 60° C. to about 65° C., from about 65° C. to about 70° C., from about 70° C. to about 75° C., from about 75° C. to about 80° C., from about 80° C. to about 85° C., from about 85° C. to about 90° C., from about 90° C. to about 95° C., e.g., about 60° C. In some embodiments, a suitable thermoresponsive polymer, e.g., a suitable agarose, has a melting point of about 60° C.

In some embodiments, single cells can be encapsulated in molten gel droplets, e.g., molten agarose gel droplets, which can be solidified into solidified microgel droplets. In some embodiments, molten agarose gel droplets are solidified by cooling. In some embodiments, a microgel droplet can comprise a polymer that is transformed into a solid state upon cross linking. For example, hydrogel droplets can comprise acrylamide and solidifies upon chemical and/or photo cross linking. For example, microgel droplets can comprise poly(ethylene glycol) (PEG) and solidifies upon chemical and/or photo cross linking. In some embodiments, hydrogel droplets can comprise alginate, which is solidified upon the addition of calcium.

In some embodiments, a microgel droplet for use in the methods described herein includes or forms a solidified microgel having pores sized to retain nucleic acids within the solidified microgel. In some embodiments, a solidified microgel includes pores sized to retain nucleic acids within the solidified microgel, but allows other materials to move in and out of the solidified microgel. For example, large molecular macromolecules (e.g., genomic DNA) are retained in the solidified microgel, while other materials such as lipids and proteins are able to move out of the solidified microgel, e.g., as a result of one or more washing steps. The pore size of a solidified microgel is a function of the microgel type and concentration used. In some embodiments, the solidified microgel is a solidified agarose microgel having pore sizes that are a function of the agarose type and concentration used. In some embodiments, the pore size of a solidified microgel made from a microgel formulation at about 1.5% to about 2% concentration, e.g., at about 1.3% to about 1.5%, at about 1.4% to about 1.6%, at about 1.5% to about 1.7%, at about 1.6% to about 1.8%, at about 1.7% to about 1.9%, at about 1.8% to about 2.0%, at about 1.9% to about 2.1%, can have pore sizes in the range of from about 50 nm to about 150 nm, e.g., from about 30 nm to about 50 nm, from about 50 nm to about 70 nm, from about 70 nm to about 90 nm, from about 90 nm to about 110 nm, from about 110 nm to about 130 nm, from about 130 nm to about 150 nm, from about 150 nm to about 170 nm. A person of ordinary skill in the art will be able to determine a microgel pore size. For example, a microgel pore size may be determined by methods described in Narayanan et al., Journal of Physics: Conference Series. 2006, 28:83-86 the disclosure of which is incorporated by reference herein,

Accordingly, in some embodiments, a method of sequencing single cell genomic DNA as provided by the present disclosure includes: encapsulating a population of single cells in molten gel droplets to provide a population of molten gel droplets, wherein each molten gel droplet of the population contains zero or one cell.

Methods for sequencing single cell genomic DNA from a population of single cells as provided herein, may find use in sequencing single cell genomic DNA from a complex mixture of cells. In some embodiments, a population of single cells may be homogeneous or heterogeneous. In some embodiments, a population of single cells may include eukaryotic cells (e.g., mammalian cells, fungal cells, etc.) or prokaryotic cells (e.g., bacterial cells), or a combination thereof. In some embodiments, a population of single cells may be obtained from a variety of sources, e.g., blood samples collected by venipuncture, blood samples collected by finger stick, cerebral spinal fluid samples collected by lumbar puncture, environmental samples, etc.

Purification of Single Cell Genomic DNA:

According to embodiments of the methods described herein, the genomic DNA of individual single cells is purified in bulk, while maintaining isolation of genomic DNA from different cells in different solidified microgels. In some embodiments, bulk purification of single cell genomic DNA is facilitated by the encapsulation of single cells in microgel droplets that can be solidified. For example, a method of purifying single cell genomic DNA from a population of single cells includes encapsulating the population of single cells in molten gel droplets to provide a population of molten gel droplets, wherein each molten gel droplet of the population contains zero or one cell. The population of molten gel droplets is then solidified to provide a population of solidified microgel droplets. The method includes breaking the emulsions of the solidified microgel droplets to provide a population of solidified microgels, and exposing the population of solidified microgels in bulk to lysis conditions.

In some embodiments, the population of solidified microgels is exposed in bulk to lysis conditions sufficient to lyse cells contained within the population of solidified microgels. In some embodiments, lysis conditions include contacting the population of solidified microgels with a lytic enzyme. Those of ordinary skill in the art will recognize that different lytic enzymes could be used depending on the type of cells that are in the solidified microgels. For example, bacterial cells can be lysed using one or more lytic enzymes including, e.g., achromopeptidase, labiase, lysostaphin, lysozyme, mutanolysin, and the like. Yeast cells can be lysed using one or more lytic enzymes including, e.g., zymolyase, kitalase, GLUCANEX, lyticase, and the like. Plant cells can be lysed using one or more lytic enzymes including, e.g., cellulose, pectinase, pectolyase, and the like. Mammalian cells can be lysed using one or more lytic enzymes including, e.g., tetanolysin, α-hemolysin, steptolysin O, and the like. A person of ordinary skill in the art will be able to select from multiple lytic enzymes the one, or combination that is most suitable for lysing cells of interest. In some embodiments, the disclosed methods specifically include contacting the population of solidified microgels in bulk with two or more different lytic enzymes, e.g., wherein the two or more different lytic enzymes are capable of lysing different cell types. Such contacting may occur simultaneously or in separate temporal steps, e.g., separated by one or more wash steps.

In some embodiments, cell lysis includes contacting the population of solidified microgels in bulk with one or more lytic enzymes, e.g., a mixture of lytic enzymes, and incubating the solidified microgels for a period of time sufficient to lyse the cells, e.g., from about 5 min to about 24 hours, inclusive, e.g., about 10 min to about 24 hours, about 20 min to about 24 hours, about 30 min to about 24 hours, about 40 min to about 24 hours, about 50 min to about 24 hours, about 1 hour to about 24 hours, about 3 hours to about 24 hours, about 6 hours to about 24 hours, about 9 hours to about 24 hours, about 12 hours to about 24 hours, about 15 hours to about 24 hours, about 18 hours to about 24 hours, or about 21 hours to about 24 hours. In some embodiments, cell lysis includes contacting the population of solidified microgels in bulk with one or more lytic enzymes, e.g., a mixture of lytic enzymes, and incubating the solidified microgels for about 10 min to about 20 min, about 20 min to about 30 min, about 30 min to about 1 hour, about 1 hour to about 3 hours, about 3 hours to about 6 hours, about 6 hours to about 9 hours, about 9 hours to about 12 hours, about 12 hours to about 15 hours, about 15 hours to about 18 hours about 18 hours to about 21 hours, or about 21 hours to about 24 hours. In some embodiments, the population of solidified microgels is incubated with the mixture of lytic enzymes overnight to lyse the cells contained within the solidified microgels.

In some embodiments, the method of purifying single cell genomic DNA from a population of single cells includes contacting the population of solidified microgels with one or more detergents to solubilize cellular material contained within the population of solidified microgels. For example, detergents can solubilize membrane lipids that are released upon cell lysis. Suitable detergents for use in methods of the present disclosure include those that are well known in the art. Common detergents include: sodium dodecyl sulfate (SDS), SDS lauryl, SDS C12, TRITON X-100, TRITON X-114, NP-40, TWEEN-20, TWEEN-80, octyl glucoside, octylthio glucoside, 3-((3-cholamidopropyl) dimethylammonio)-1-propanesulfonate (CHAPS), lithium dodecyl sulfate, and the like. In some embodiments, the disclosed methods specifically include contacting the population of solidified microgels in bulk with two or more different detergents. Such contacting may occur simultaneously or in separate temporal steps, e.g., separated by one or more wash steps.

In some embodiments, the method of purifying single cell genomic DNA from a population of single cells includes contacting the population of solidified microgels with a protease to digest cellular proteins contained within the population of solidified microgels. Suitable proteases for use in methods of the present disclosure include those that are well known in the art, e.g., a serine protease, a subtilisin-type protease, e.g., proteinase K, brofasin, and the like. In some embodiments, the population of solidified microgels are incubated with proteinase K under conditions sufficient to digest cellular proteins, e.g., at 50° C. for 30 minutes or any other suitable temperature and time sufficient to digest cellular proteins contained within the population of solidified microgels.

As described herein, upon cell lysis within the population of solidified microgels, large molecular weight macromolecules (e.g., genomic DNA) are trapped within the solidified microgels. Hence, the genomic DNA from one cell does not mix with the genomic DNA of another cell, efficiently compartmentalizing the genomic DNA of each individual single cell into its respective solidified microgel. Due to the porosity of the solidified microgels, smaller sized molecules such as lytic enzymes, proteases, and detergents are able to freely enter the solidified microgel to, e.g., digest proteins, digest fragments of cell walls and solubilize lipids. Following successful cell lysis, the genomic DNA can then be purified.

In some embodiments, the population of solidified microgels is washed to remove lytic enzymes and/or detergents and other chemical species which may inhibit downstream molecular biology reactions, such as PCR reactions. The population of solidified microgels is washed by contacting the population of solidified microgels with a washing buffer. In some embodiments, washing the solidified microgels includes contacting the population of solidified microgels with a series of washing buffers. In some embodiments, the washing buffer includes TWEEN-20 and ethanol, although any suitable washing buffer known in the art may be utilized.

Accordingly, a method of sequencing single cell genomic DNA as provided by the present disclosure includes: encapsulating a population of single cells in molten gel droplets to provide a population of molten gel droplets, wherein each molten gel droplet of the population contains zero or one cell; solidifying the population of molten gel droplets to provide a population of solidified microgel droplets; breaking the emulsions of the solidified microgel droplets to provide a population of solidified microgels; exposing the population of solidified microgels in bulk to lysis conditions sufficient to lyse cells contained within the population of solidified microgels; and purifying genomic DNA from cells contained within the population of solidified microgels in bulk to provide a population of solidified microgels including purified genomic DNA.

An important advantage of the methods provided in the present disclosure is that exposing the population of solidified microgels in bulk to lysis conditions sufficient to lyse cells contained within the population of solidified microgels allows for the application of harsh (stringent) lysis conditions that may not be compatible with lysis methods that are performed in-droplet. For example, use of strong detergents (e.g., sodium dodecyl sulfate) and chemicals afforded by the methods provided herein, may destabilize water-in-oil emulsion droplets. Hence, the ability to use strong detergents and chemicals during lysis in the present methods may allow for the study of more diverse cell types. In addition, exposing the population of solidified microgels in bulk to lysis conditions allows for the efficient application of different lysis conditions, e.g., in separate steps (e.g., separated by one or more wash steps), which are designed or sufficient to lyse different cell types which may be present in the original cell population.

Fragmenting and Tagging Genomic DNA

The disclosed methods may include a step of fragmenting the genomic DNA, e.g., to a length that permits their sequencing with existing sequencing platforms, which often have limited read length. Fragmentation can be achieved in a variety of ways and can be applied to either amplified or non-amplified nucleic acid targets. For example, enzymes capable of fragmenting DNA such as Fragmentase® or other nucleases can be introduced into a microgel droplet, a solidified microgel droplet, and/or a solidified microgel as described herein and the microgel droplet, solidified microgel droplet, and/or the solidified microgel subjected to conditions sufficient for fragmentation. Suitable enzymes capable of fragmenting DNA may include, e.g., DNAse I, micrococcal nuclease, DNAse III, and any other nuclease that results in fragmented DNA, including nucleases with sequence specific catalysis. Alternatively, chemical methods can be used, such as the inclusion of acids, reactive oxygen species, etc. Organisms that degrade DNA can also be used by including them in the microgel droplet, solidified microgel droplet, and/or solidified microgel with the nucleic acids. Physical methods, such as shear generated by flow of the nucleic acids, in the microgel droplet, solidified microgel droplet, and/or solidified microgel, can also be used. Other methods can also be used that perform multiple operations on the nucleic acids including fragmentation. For example, transposons can be used to insert or attach sequences into the nucleic acids, often fragmenting them in the process.

Accordingly, in some embodiments, the fragmented genomic DNA may be size selected for DNA fragments in the 200-600 bp range. For example, the fragmented genomic DNA may be size selected in the 50-750 bp range, 75-725 bp range, 100-700 bp range, 125-675 bp range, 150-650 bp range, 175-625 bp range, or any range bound between two of the following sizes: 25 bp, 50 bp, 75 bp, 100 bp, 125 bp, 150 bp, 175 bp, 200 bp, 225 bp, 250 bp, 275 bp, 300 bp, 325 bp, 350 bp, 375 bp, 400 bp, 425 bp, 450 bp, 475 bp, 500 bp, 525 bp, 550 bp, 575 bp, 600 bp, 625 bp, 650 bp, 675 bp, 700 bp, 725 bp, 750 bp, 775 bp, 800 bp or more. Size selection of the fragmented genomic DNA can be performed by any method known in the art, for example, using agarose gel electrophoresis, solid phase reversible immobilization beads (e.g., AMPure XP beads), microfluidic instruments (e.g., Caliper Labchip XT), commercially available library construction kits (e.g., Sage Science Pippin Prep), etc. Size selection of fragmented genomic DNA may occur after fragmented genomic DNA is obtained, after the fragmented genomic DNA is tagged, or after the tagged, fragmented genomic DNA is barcoded.

Accordingly, in some embodiments, the present disclosure provides a method for sequencing single cell genomic DNA including purifying genomic DNA from cells contained within a population of solidified microgels in bulk to provide a population of solidified microgels including purified genomic DNA, and fragmenting the purified genomic DNA to provide a population of solidified microgels including fragmented genomic DNA.

In some embodiments, the population of solidified microgels including purified genomic DNA is re-encapsulated before the step of fragmenting the purified genomic DNA. Accordingly, in some embodiments the present disclosure provides a method for sequencing single cell genomic DNA including purifying genomic DNA from cells contained within a population of solidified microgels in bulk to provide a population of solidified microgels including purified genomic DNA, encapsulating the population of solidified microgels including purified genomic DNA into droplets to provide a population of purified genomic DNA-containing droplets, and fragmenting the purified genomic DNA to provide a population of fragmented genomic DNA-containing droplets.

In some embodiments, encapsulating the population of solidified microgels including purified genomic DNA into droplets to provide a population of purified genomic DNA-containing droplets includes encapsulating the solidified microgels with reagents for use in fragmentation and tagging of the purified genomic DNA. In some embodiments, fragmentation and tagging of genomic DNA occurs simultaneously, e.g., in a tagmentation step, and encapsulating the solidified microgels with reagents for use in fragmentation and tagging of the purified genomic DNA includes encapsulating the solidified microgels with tagmentation reagents, e.g., a complex including a transposase and a transposon. For example, in some embodiments, each of the members of the population of purified genomic DNA-containing droplets includes a complex including a transposase and a transposon.

In some embodiments, a method for sequencing single cell genomic DNA includes purifying genomic DNA from cells contained within a population of solidified microgels in bulk to provide a population of solidified microgels including purified genomic DNA. In some embodiments, the purified genomic DNA is subject to conditions that fragment the purified genomic DNA to provide a population of solidified microgels including fragmented genomic DNA. In some embodiments, the fragmented genomic DNA is optionally tagged with a common adapter sequence. In some embodiments, fragmentation and tagging of genomic DNA occurs simultaneously.

In some embodiments, fragmentation of genomic DNA can be achieved using Fragmentase® (NEB), Transposon Insertion (Nextera), non-specific DNA endonuclease such as DNAseI, or incorporation of modified bases during amplification and cleavage using DNA repair enzymes, such as dUTP incorporation during amplification and specific cleavage using EndoV and uracil glycosylase. Hydrodynamic shearing can also be used to fragment DNA.

In some embodiments, the method includes fragmenting the purified genomic DNA via transposon insertion, e.g., using Tn5 transposon, Mu transposon, or any other suitable transposon known in the art. In such embodiments, the method includes contacting the purified genomic DNA with a complex including a transposase and a transposon. In some embodiments, the complex includes a transposon that includes an adapter sequence. Contacting the purified genomic DNA with the complex results in fragmented genomic DNA including the adapter sequence. In certain embodiments, because of the dimeric nature of transposases, the fragmented genomic DNA remains intact as a macromolecular complex and continues to be retained within the population of solidified microgels. Accordingly, a population of solidified microgels including fragmented genomic DNA optionally including a common adapter sequence is obtained.

Accordingly, an example method of sequencing single cell genomic DNA as provided by the present disclosure includes: encapsulating a population of single cells in molten gel droplets to provide a population of molten gel droplets, wherein each molten gel droplet of the population contains zero or one cell; solidifying the population of molten gel droplets to provide a population of solidified microgel droplets; breaking the emulsions of the solidified microgel droplets to provide a population of solidified microgels; exposing the population of solidified microgels in bulk to lysis conditions sufficient to lyse cells contained within the population of solidified microgels; purifying genomic DNA from cells contained within the population of solidified microgels in bulk to provide a population of solidified microgels including purified genomic DNA; encapsulating the population of solidified microgels including purified genomic DNA into droplets to provide a population of purified genomic DNA-containing droplets; and fragmenting the purified genomic DNA within the population of purified genomic DNA-containing droplets to provide a population of fragmented genomic DNA-containing droplets.

Barcoding Fragmented Genomic DNA

The disclosed methods may include a step of barcoding a population of solidified microgels including fragmented genomic DNA optionally including a common adapter sequence. Barcoding is performed such that the fragmented genomic DNA of each individual single cell is associated with an identifying barcode sequence, e.g., a single unique barcode sequence. In some embodiments, barcoding of the fragmented genomic DNA can be performed in a single step, for example, by incorporating the barcode sequences using a transposase, or in two steps, in which barcode sequences are added to the fragmented genomic DNA with, for example, ligase or overlap extension PCR.

In some embodiments, a population of solidified microgels including fragmented genomic DNA can be merged together with a library of barcode sequences, wherein each identifying barcode sequence (or population of an identifying barcode sequence), e.g., each unique barcode sequence (or population of a unique barcode sequence) of the library of barcode sequences is separately encapsulated in a droplet. Accordingly, in some embodiments, a method of sequencing single cell genomic DNA includes encapsulating the population of solidified microgels including fragmented genomic DNA into droplets to provide a population of fragmented genomic DNA-containing droplets. The population of fragmented genomic DNA-containing droplets may then be merged with a library of barcode sequence containing droplets such that each fragmented genomic DNA-containing droplet is merged with an identifying barcode sequence (or population of an identifying barcode sequence), e.g., a unique barcode sequence (or population of a unique barcode sequence) containing droplet. The method may further include subjecting the population of droplets containing both the fragmented genomic DNA and barcode sequence to conditions sufficient for enzymatic incorporation of the barcode sequence into the fragmented genomic DNA.

One approach for incorporating a barcode sequence into fragmented genomic DNA is to use primers that are complementary to the adapter sequences and the barcode sequences, such that the product amplicons of both fragmented genomic DNA and barcodes can anneal to one another and, via an extension reaction such as DNA polymerization, be extended onto one another, generating a double stranded product including the fragmented genomic DNA attached to the barcode sequence.

Alternatively or additionally, the primers that amplify that target can themselves be barcoded so that, upon annealing and extending onto the target, the amplicon produced has the barcode sequence incorporated into it. This can be applied with a number of amplification strategies, including specific amplification with PCR or non-specific amplification with, for example, multiple displacement amplification (MDA).

An alternative or additional enzymatic reaction that can be used to attach barcodes to fragmented genomic DNA is ligation, including blunt or sticky end ligation. In this approach, the DNA barcodes are incubated with the fragmented genomic DNA and ligase enzyme, resulting in the ligation of the barcode to the targets. The ends of the fragmented genomic DNA can be modified as needed for ligation by a number of techniques, including by using adaptors introduced with ligase or fragments to enable greater control over the number of barcodes added to the end of the molecule.

Yet another approach for adding the barcodes to the fragmented genomic DNA is to introduce them directly with a transposase or with a combination of enzymes, such as a non-specific endonuclease or combination of non-specific endonucleases (e.g., Fragmentase®) and ligase. For example, in this approach, barcodes can be synthesized that are compatible with a transposase. The transposase can then fragment the purified genomic DNA and add the barcodes to the ends of the fragment molecules, performing all steps of the reaction in one reaction. A combination of Fragmentase® and ligase can also be used, wherein the Fragmentase® is used to fragment the nucleic acids to a size suitable for sequencing, and the ligase is used to attach the barcodes to the fragment ends.

Accordingly, an example method of sequencing single cell genomic DNA as provided by the present disclosure includes: encapsulating a population of single cells in molten gel droplets to provide a population of molten gel droplets, wherein each molten gel droplet of the population contains zero or one cell; solidifying the population of molten gel droplets to provide a population of solidified microgel droplets; breaking the emulsions of the solidified microgel droplets to provide a population of solidified microgels; exposing the population of solidified microgels in bulk to lysis conditions sufficient to lyse cells contained within the population of solidified microgels; purifying genomic DNA from cells contained within the population of solidified microgels in bulk to provide a population of solidified microgels including purified genomic DNA; encapsulating the population of solidified microgels including purified genomic DNA into droplets to provide a population of purified genomic DNA-containing droplets; fragmenting the purified genomic DNA within the population of purified genomic DNA-containing droplets to provide a population of fragmented genomic DNA-containing droplets DNA-containing; and barcoding the fragmented genomic DNA or an amplification product thereof in the population of fragmented genomic DNA-containing droplets to provide a population of barcoded, fragmented genomic DNA-containing droplets.

In some embodiments, upon obtaining a population of barcoded, fragmented genomic DNA-containing droplets, the emulsion including the population of droplets is broken and the barcoded, fragmented DNA is purified to provide purified, barcoded, fragmented genomic DNA. An optional size selection step may occur to select for purified, barcoded, genomic DNA fragments of a certain size that permits their sequencing with existing sequencing platforms. Additional disclosure with respect to barcoding nucleic acids in droplets is provided in International Patent Application Publication No. WO2016/126871, the disclosure of which is incorporated by reference herein.

Molecular Amplification and Barcoding Via MALBAC

As an alternative to tagmentation/fragmentation, purified single-cell genomes in hydrogels can be subjected to a MALBAC (Multiple Annealing and Looping Based Amplification Cycles) amplification reaction in droplets by co-flowing the microgels with amplification reagents in a microfluidic dropmaker.

The MALBAC reaction is described generally in Zong et al. Genome-wide detection of single-nucleotide and copy-number variations of a single human cell, Science, 2012, the disclosure of which is incorporated by reference herein. Briefly, in a MALBAC reaction, degenerate primers anneal to genomic DNA and extend. In cycles 2 and later, hairpin loops form after extension and denaturation. These hairpins do not participate in the later cycles of the reaction as they are in a looped conformation. Following this “quasi-linear” amplification (6-10+ cycles), PCR with a single primer is used to amplify the looped products exponentially (10+ cycles).

The product of the initial quasi-linear reaction is looped amplicons. These loops can be barcoded, either by (1) fusion with a barcoded DNA fragment or (2) directly by insertion of a barcode sequence in the MALBAC primer. Here, two novel barcoding methods are described separately for clarity.

Method 1: Fusion of molecular barcode in droplets by SOE-PCR

The MALBAC reaction uses a single primer (from Zong et al. 2012):

(SEQ ID NO: 1) 5′-GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNNNN-3′ 27 bp handle 8N

The representative primer has a 27 bp handle and 8 degenerate bases, but other variants are possible.

The droplets containing purified DNA in microgels and MALBAC amplification mix are then thermal cycled, e.g., according to the following protocol:

95-5 min 20-50 sec 30-50 sec 40-45 sec 50-45 sec {close oversize bracket} 8 cycles 65-4 min 95-20 sec 58-20 sec 4-hold

The amplicons in the droplets have the following structure. Note that the loop is single-stranded, while the 27 bp handle region is double-stranded:

5′-GTGAGTGATGGTTGAGGTAGTGTGGAG(SEQ ID NO: 2)-Loop 3′-CTCCACACTACCTCAACCATCACTCAC(SEQ ID NO: 3)-Loop

Separately, barcodes are made via digital PCR in microdroplets (e.g., as described herein). The double-stranded barcodes can have the following general structure:

5′-PCR_HANDLE -15N -GAT -3′ 3′-PCR_HANDLE*-15N*-GAT*-5′

The PCR reaction may be asymmetric to favor production of the upper single strand by using an excess of PCR_HANDLE primer (e.g., in a 1 to 5, 10, 20, etc. ratio compared to the limiting primer). The PCR_HANDLE primer may also be functionalized with biotin (or other biomolecule) to facilitate downstream capture of barcoded DNA fragments using streptavidin-coated beads.

The digital barcode droplets are microfluidically merged, e.g., in a 1:1 ratio, with the droplets containing the single-cell MALBAC amplicons. A single cycle of PCR is used to anneal and extend the barcode fragments onto the MALBAC amplicons using their complementary overlapping regions (FIG. 17). Alternatively, multiple PCR cycles can be performed and the barcoded genomic DNA amplified in droplets using primers situated at the 5′ ends of the barcoded strands (FIG. 18). In this variant, leftover MALBAC primers can be digested with a single-stranded exonuclease (e.g. Exonuclease I or similar). Primers for PCR amplification can be modified (e.g. via 3′ phosphorothioate bonds or similar) to protect against degradation by the exonuclease.

Example PCR protocol for single-cycle extension:

-   -   95-1 min     -   60-30 sec     -   72-5 min     -   12-hold

Example PCR protocol for extension and exponential amplification:

95-1 min 95-20 sec 55-30 sec {close oversize bracket} 30 cycles 72-3 min 72-5 min 12-hold

Droplets are broken and the barcoded fragments are enriched via PCR (in the case of single-cycle extension in droplets). Size-selection using, e.g., SPRI beads, gel electrophoresis, etc., is used to remove single-stranded DNA barcodes and primer dimers from the amplified product.

Downstream library preparation for next-generation sequencing can then proceed according to the specifications by the sequencing platform. For Illumina sequencing-by-synthesis (SBS) chemistry, the barcoded dsDNA is fragmented (enzymatically, mechanically, etc.), adapters are added (by ligation, Tn5 [Nextera] transposition, etc.), and the library is amplified by PCR and sequenced.

Method 2: Direct barcoding of MALBAC primers

The barcoded MALBAC reaction uses a library of primers (modified from Zong et al. 2012):

(SEQ ID NO: 4) 5′-GTGAGTGATGGTTGAGGTAGTGTGGAG[BARCODE]NNNN NNNN-3′ 27 bp handle 8N

The predefined barcode oligonucleotides (4-12+ bp barcode region) are emulsified in droplets to generate a library of microdroplets each containing micromolar-scale concentrations of a single primer variant. When looped as MALBAC amplicons, the primers form a combinatorically barcoded construct (FIG. 19).

The purified DNA in a microgel is then merged microfluidically with MALBAC amplification reagent and two primer-containing droplets (FIG. 20). The MALBAC reaction is performed with a thermal cycling protocol identical to Method 1. The resulting looped amplicons have the following structure:

                                [BARCODE1]-Loop 5′-GTGAGTGATGGTTGAGGTAGTGTGGAG-(SEQ ID NO: 2) 3′-CTCCACACTACCTCAACCATCACTCAC-(SEQ ID NO: 3)                                 [BARCODE2]-Loop

Barcode sequences 1 and 2 form a unique combinatorial identifier for the MALBAC loop.

After thermal cycling in droplets, the emulsions are broken and PCR is carried out to amplify the barcodes amplicons and create a double-stranded barcoded DNA library. The primers used for exponential amplification may also contain barcodes for sample multiplexing.

Preparation for NGS depends on the platform. However, because the barcodes are combinatorial and located at both the 5′ and 3′ ends of the construct, the dsDNA fragments cannot be fragmented during the library preparation steps.

Generating a Database of Single Cell Genome Sequencing Reads

The methods described herein may include a step of sequencing the purified, barcoded, fragmented genomic DNA. DNA sequence can be achieved with commercially available next generation sequencing (NGS) platforms, including platforms that perform sequencing by synthesis, sequencing by ligation, pyrosequencing, using reversible terminator chemistry, using phospholinked fluorescent nucleotides, or real-time sequencing. For example, the purified, barcoded, fragmented genomic DNA may be sequenced on an Illumina MiSeq platform using a custom index primer.

Accordingly, an example method of sequencing single cell genomic DNA as provided by the present disclosure includes: encapsulating a population of single cells in molten gel droplets to provide a population of molten gel droplets, wherein each molten gel droplet of the population contains zero or one cell; solidifying the population of molten gel droplets to provide a population of solidified microgel droplets; breaking the emulsions of the solidified microgel droplets to provide a population of solidified microgels; exposing the population of solidified microgels in bulk to lysis conditions sufficient to lyse cells contained within the population of solidified microgels; purifying genomic DNA from cells contained within the population of solidified microgels in bulk to provide a population of solidified microgels including purified genomic DNA; encapsulating the population of solidified microgels including purified genomic DNA into droplets to provide a population of purified genomic DNA-containing droplets; and fragmenting the purified genomic DNA within the population of purified genomic DNA-containing droplets to provide a population of fragmented genomic DNA-containing droplets; barcoding the fragmented genomic DNA or an amplification product thereof in the population of fragmented genomic DNA-containing droplets to provide a population of barcoded, fragmented genomic DNA-containing droplets; purifying barcoded, fragmented genomic DNA from the barcoded, fragmented genomic DNA-containing droplets to provide purified, barcoded, fragmented genomic DNA; and sequencing the purified, barcoded fragmented genomic DNA.

Raw sequencing reads obtained from the commercial NGS platform can be filtered by quality and grouped by barcode sequence using any suitable scripts known in the art, e.g., Python script barcodeCleanup.py. In some embodiments, a given sequencing read may be discarded if more than about 20% of its bases have a quality score (Q-score) less than Q20, indicating a base call accuracy of about 99%. In some embodiments, a given sequencing read may be discarded if more than about 5%, about 10%, about 15%, about 20%, about 25%, about 30% have a Q-score less than Q10, Q20, Q30, Q40, Q50, Q60, or more, indicating a base call accuracy of about 90%, about 99%, about 99.9%, about 99.99%, about 99.999%, about 99.9999%, or more, respectively.

In some embodiments, all sequencing reads associated with a barcode containing less than 50 reads may be discarded to ensure that all barcode groups, representing single cells, contain a sufficient number of high-quality reads. In some embodiments, all sequencing reads associated with a barcode containing less than 30, less than 40, less than 50, less than 60, less than 70, less than 80, less than 90, less than 100 or more may be discarded to ensure the quality of the barcode groups representing single cells.

Once the raw sequencing reads are filtered by quality and grouped by barcode sequence, the sequences may be exported to a table, e.g., a table in a relational database, e.g., a SQLite database, including fields pertinent to identifying sequencing reads that are obtained from a single cell. In some embodiments, the sequences may be exported to a table in a relational database, e.g., a SQLite database including fields containing the barcode sequence, barcode group size, a unique read ID number, and read sequence. In some embodiments, e.g., in the case of analyzing a synthetic cell population (see, Experimental section), sequencing reads may be aligned using any suitable available software program, e.g., bowtie2 v2.2.9, and the SQLite table may be updated with relevant alignment information for each sequencing read. In some embodiments, e.g., when analyzing environmental samples (see, Experimental section), the sequencing reads may be classified by taxonomy using any suitable available software program, e.g., Kraken v0.10.5, and the SQLite table may be updated with relevant taxonomic information for each sequencing read. In some embodiments, barcode group purity may be calculated from reference alignment data or phylogenetic labels using any suitable available script, e.g. Python script purity.py.

Utility

The present disclosure provides methods for sequencing single cell genomic DNA from a population of single cells. Methods provided herein may find use in copy number variant analysis for cancer. For example, in cancer, cells undergo rapid evolution that leads to massive mutation of their genomes. One form of mutation is copy number variation, in which different parts of the genome are duplicated or erased. In some embodiments, methods provided herein allows for the counting of known sequences across the genome. An advantage of using a method provided herein is that copy number variant analysis can be performed with low coverage of the genome, often less than 1%, allowing it to be measured without having to perform large amounts of sequencing.

In some embodiments, methods provided herein may find use in blood sequencing, wherein blood sequencing includes sequencing all cells of interest in a blood sample (e.g., immune cells). For example, a blood sample may be collected and the relevant cell types extracted. Immune cells sample the circulatory system, and as such, their biological state may be representative of disease, e.g., infection, sepsis, cancer, autoimmune disorders, etc. In some embodiments, sequencing the nucleic acids of immune cells from a blood sample may allow detection of a variety of diseased states. In some embodiments, rare cells such as circulating tumor cells and circulating fetal cells may be detected using the methods provided herein. In some embodiments, while the majority of sequenced cells from a blood sample containing circulating tumor and/or fetal cells will not correspond to the cell population of interest, when any such circulating tumor and/or fetal cell is identified, complete information about its genome may be recovered.

In some embodiments, methods provided herein may find use in the field of metagenomics. For example, methods provided herein may find use in studying diverse microbial systems, wherein their analysis may be valuable in identifying rare system members and allowing them to be recovered for detailed study, e.g., to recover their nucleic acid sequences.

In some embodiments, methods provided herein may find use in studying latent human immunodeficiency virus (HIV) infection. Using methods as described herein, cells from an infected individual may be sorted based on the presence of HIV genes. Sorted cells may be sequenced to recover information about how their genome may be modulated by the virus. Those of skill in the art will be able to envision the power of using the methods described herein and will be able to adapt the methods for use in any application.

While the methods provided herein have been described primarily with respect to cellular genomic DNA, it should be noted that the SiC-seq method also provides a means to isolate and barcode large DNA molecules, irrespective of the entity from which they originate. While the described methods have focused on cells, similar approaches can be applied to viruses whose genomes can be trapped and processed within the gel matrix.

Thus, for example, in some embodiments the present disclosure provides a method of sequencing genomic DNA, e.g., viral genomic DNA, the method including encapsulating a population of biological entities, e.g., viruses, in molten gel droplets to provide a population of molten gel droplets, wherein each molten gel droplet of the population contains zero or one biological entity; solidifying the population of molten gel droplets to provide a population of solidified microgel droplets; breaking the emulsions of the solidified microgel droplets to provide a population of solidified microgels; exposing the population of solidified microgels in bulk to lysis conditions sufficient to lyse biological entities contained within the population of solidified microgels; purifying genomic DNA from biological entities contained within the population of solidified microgels in bulk to provide a population of solidified microgels comprising purified genomic DNA; encapsulating the population of solidified microgels comprising purified genomic DNA into droplets to provide a population of purified genomic DNA-containing droplets; fragmenting the purified genomic DNA within the population of purified genomic DNA-containing droplets to provide a population of fragmented genomic DNA-containing droplets; barcoding the fragmented genomic DNA or an amplification product thereof in the population of fragmented genomic DNA-containing droplets to provide a population of barcoded, fragmented genomic DNA-containing droplets; purifying barcoded, fragmented genomic DNA from the barcoded, fragmented genomic DNA-containing droplets to provide purified, barcoded, fragmented genomic DNA; and sequencing the purified, barcoded, fragmented genomic DNA. Similarly, it should be understood that each of the non-limiting aspects of the disclosure numbered 1-36 as provided below may be modified to refer to a suitable biological entity, e.g., a virus, rather than a cell.

Each of the methods described herein may be modified as appropriate for use with non-cell based biological entities or large DNA molecules.

Devices and Systems

As indicated above, embodiments of the disclosed subject matter employ systems and/or devices including microfluidic devices. Devices of the subject disclosure include all those described above in association with the subject methods. Microfluidic devices of this disclosure may be characterized in various ways.

In some aspects, for example, microfluidic systems and/or devices are provided which include one or more droplet makers, configured to generate droplets, as described herein, and/or one or more flow channels. In some aspects, the one or more flow channels are operably connected, e.g., fluidically connected, to the one or more droplet makers and/or are configured to receive one or more droplets therefrom. By “operably connected” and “operably coupled”, as used herein, is meant connected in a specific way (e.g., in a manner allowing fluid, e.g., water, to move and/or electric power to be transmitted) that allows a disclosed system or device and its various components to operate effectively in the manner described herein.

As noted above, microfluidic devices may include one or more flow channels, e.g., flow channels which droplets may pass into, out of, and/or through. In certain embodiments, flow channels are one or more “micro” channel Such channels may have at least one cross-sectional dimension on the order of a millimeter or smaller (e.g., less than or equal to about 1 millimeter). For certain applications, this dimension may be adjusted; in some embodiments the at least one cross-sectional dimension is about 500 micrometers or less. In some embodiments, the cross-sectional dimension is about 100 micrometers or less, or about 10 micrometers or less, and sometimes about 1 micrometer or less. A cross-sectional dimension is one that is generally perpendicular to the direction of centerline flow, although it should be understood that when encountering flow through elbows or other features that tend to change flow direction, the cross-sectional dimension in play need not be strictly perpendicular to flow. It should also be understood that in some embodiments, a micro-channel may have two or more cross-sectional dimensions such as the height and width of a rectangular cross-section or the major and minor axes of an elliptical cross-section. Either of these dimensions may be compared against sizes presented here. Note that micro-channels employed in this disclosure may have two dimensions that are grossly disproportionate—e.g., a rectangular cross-section having a height of about 100-200 micrometers and a width on the order or a centimeter or more. Of course, certain devices may employ channels in which the two or more axes are very similar or even identical in size (e.g., channels having a square or circular cross-section).

Microfluidic devices, in some embodiments of this disclosure, are fabricated using microfabrication technology. Such technology may be employed to fabricate integrated circuits (ICs), microelectromechanical devices (MEMS), display devices, and the like. Among the types of microfabrication processes that can be employed to produce small dimension patterns in microfluidic device fabrication are photolithography (including X-ray lithography, e-beam lithography, etc.), self-aligned deposition and etching technologies, anisotropic deposition and etching processes, self-assembling mask formation (e.g., forming layers of hydrophobic-hydrophilic copolymers), etc.

In view of the above, it should be understood that some of the principles and design features described herein can be scaled to larger devices and systems including devices and systems employing channels reaching the millimeter or even centimeter scale channel cross-sections. Thus, when describing some devices and systems as “microfluidic,” it is intended that the description apply equally, in certain embodiments, to some larger scale devices.

When referring to a microfluidic “device” it is generally intended to represent a single entity in which one or more channels, reservoirs, stations, etc. share a continuous substrate, which may or may not be monolithic. Aspects of microfluidic devices include the presence of one or more fluid flow paths, e.g., channels, having dimensions as discussed herein. A microfluidics “system” may include one or more microfluidic devices and associated fluidic connections, electrical connections, control/logic features, etc.

Systems may also include one or more of: (a) a temperature control module for controlling the temperature of one or more portions of the subject devices and/or droplets therein and which is operably connected to the microfluidic device(s), (b) a detection means, i.e., a detector, e.g., an optical imager, operably connected to the microfluidic device(s), (c) an incubator, e.g., a cell incubator, operably connected to the microfluidic device(s), and (d) a sequencer operably connected to the microfluidic device(s). The subject systems may also include one or more conveyor configured to move, e.g., convey, a substrate from a first droplet, receiving position to one or more of (a)-(d).

The subject devices and systems, include one or more sorter for sorting droplets, into one or more flow channels. Such a sorter may sort and distribute droplets, based on one or more characteristics of the droplets including composition, size, shape, buoyancy, or other characteristics.

Aspects of the devices also include one or more detection means i.e., a detector, e.g., an optical imager, configured for detecting the presence of one or more droplets, or one or more characteristics thereof, including their composition. In some embodiments, detection means are configured to recognize one or more components of one or more droplets, in one or more flow channel.

In various embodiments, microfluidic devices of this disclosure provide a continuous flow of a fluid medium. Fluid flowing through a channel in a microfluidic device exhibits many unique properties. Typically, the dimensionless Reynolds number is extremely low, resulting in flow that always remains laminar. Further, in this regime, two fluids joining will not easily mix, and diffusion alone may drive the mixing of two compounds.

In addition, the subject devices, in some embodiments, include one or more temperature and/or pressure control module. Such a module may be capable of modulating temperature and/or pressure of a carrier fluid in one or more flow channels of a device. More specifically, a temperature control module may be one or more thermal cycler.

An exemplary embodiment is described with reference to FIG. 1, Panels A, B and C, which depict schematics of microfluidic devices that may be used to a) generate barcode droplets and encapsulate cells in microgels (e.g., agarose microgels), b) re-encapsulate microgels with fragmenting and/or tagging reagents, e.g., tagmentation reagents, and c) merge microgel droplets with barcode droplets and PCR droplets. As shown in FIG. 1, Panel A, single cells are packaged into molten gel droplets and are encapsulated and spaced by a carrier fluid, e.g., oil to provide encapsulated single cells. Molten gel droplet formation may be slow, due to the viscosity of the mix and interfacial tension properties. To speed the formation of molten gel droplets, bubble triggering may be used (Abate, A. R., and Weitz, D. A., Lab Chip, 11(10):1713-1716, 2011), the disclosure of which is incorporated by reference herein. In some embodiments, molten gel solution and cells are co-flowed into a droplet generator under jetting flow conditions. Oil is also introduced under jetting flow conditions such that a jet is formed of aqueous phase in oil. Bubble triggering introduces air bubbles near the jet, either by injecting as air bubbles, or by injecting an air-stream alongside the jet. In some cases, the air bubbles perturb the jet, causing it to break into droplets. If air bubbles are periodically spaced, droplets will form between the air bubbles of a uniform size, increasing the rate of monodisperse droplet generation.

Accordingly, in some embodiments, the present disclosure provides a system including a microfluidic device, a molten gel reservoir and a heating element, the microfluidic device including a co-flow droplet maker including: a first input channel configured to provide a plurality of cells to a flow channel; a second input channel configured to provide a molten gel flow to the flow channel from the gel reservoir, wherein the heating element is positioned in proximity to the gel reservoir and configured to apply heat to the gel reservoir sufficient to maintain a molten gel in the molten gel reservoir in a molten state; and a third input channel and a fourth input channel positioned on opposite sides of the flow channel and downstream of the first and second input channels, wherein the third and fourth input channels are configured to provide immiscible phase fluid flows to the flow channel.

Now referring to FIG. 1, Panel B, a device that may be of use to carry out the methods of the present disclosure includes a microfluidic device for re-encapsulating microgels with fragmenting and/or tagging reagents, e.g., tagmentation reagents. As shown in FIG. 1, Panel B, microgels are combined with fragmenting and/or tagging reagents, e.g., tagmentation reagents, and are encapsulated and spaced by a carrier fluid, e.g., oil to provide encapsulated single cells. Now referring to FIG. 1, Panel C, a device that may be of use to carry out the methods of the present disclosure that may further include reservoirs for incorporating reagents (e.g., PCR reagents, cell lysis reagents, etc.) and for incorporating barcode nucleic acid sequences is provided, in which liquid electrodes, moats, droplet mergers and the reservoirs for the spacing carrier fluid, e.g., oil, barcode droplets, and PCR droplets are identified.

Accordingly, in some embodiments, the present disclosure provides a device including components for (a) introducing two or more populations of droplets into a flow channel, (i) wherein the flow channel includes a droplet merger section associated with one or more electrodes or one or more portions of one or more electrodes configured to apply an electric field in the droplet merger section of the flow channel, (ii) wherein the two or more populations of droplets are introduced into the flow channel at a single junction from two or more separate inlet channels, respectively, and (iii) wherein the two or more populations of droplets are introduced into the flow channel such that the droplet inputs from each inlet channel at least partially synchronize due to hydrodynamic effects, resulting in the ejection of spaced groups of droplets, in which at least some of the spaced groups of droplets include a droplet from each of the two or more populations of droplets; (b) flowing the spaced groups of droplets into the droplet merger section; and (c) merging droplets within a spaced group by applying an electric field in the droplet merger section of the flow channel using the one or more electrodes or the one or more portions of the one or more electrodes.

Exemplary Non-Limiting Aspects of the Disclosure

Aspects, including embodiments, of the present subject matter described above may be beneficial alone or in combination, with one or more other aspects or embodiments. Without limiting the foregoing description, certain non-limiting aspects (Set A and Set B) of the disclosure are provided below. As will be apparent to those of ordinary skill in the art upon reading this disclosure, each of the individually numbered aspects may be used or combined with any of the preceding or following individually numbered aspects. This is intended to provide support for all such combinations of aspects and is not limited to combinations of aspects explicitly provided below:

Set A

1. A method of sequencing single cell genomic DNA, the method comprising:

-   -   encapsulating a population of single cells in molten gel         droplets to provide a population of molten gel droplets, wherein         each molten gel droplet of the population contains zero or one         cell;     -   solidifying the population of molten gel droplets to provide a         population of solidified microgel droplets;     -   breaking the emulsions of the solidified microgel droplets to         provide a population of solidified microgels;     -   exposing the population of solidified microgels in bulk to lysis         conditions sufficient to lyse cells contained within the         population of solidified microgels;     -   purifying genomic DNA from cells contained within the population         of solidified microgels in bulk to provide a population of         solidified microgels comprising purified genomic DNA;     -   encapsulating the population of solidified microgels comprising         purified genomic DNA into droplets to provide a population of         purified genomic DNA-containing droplets;     -   fragmenting the purified genomic DNA within the population of         purified genomic DNA-containing droplets to provide a population         of fragmented genomic DNA-containing droplets;     -   barcoding the fragmented genomic DNA or an amplification product         thereof in the population of fragmented genomic DNA-containing         droplets to provide a population of barcoded, fragmented genomic         DNA-containing droplets;     -   purifying barcoded, fragmented genomic DNA from the barcoded,         fragmented genomic DNA-containing droplets to provide purified,         barcoded, fragmented genomic DNA; and     -   sequencing the purified, barcoded, fragmented genomic DNA.

2. The method of 1, wherein the barcoding comprises merging each of the fragmented genomic DNA-containing droplets with a barcode containing droplet.

3. The method of 2, wherein each of the barcode containing droplets comprises a unique nucleic acid barcode sequence.

4. The method of any one of 1 to 3, wherein the method comprises incorporating an adaptor nucleic acid sequence into the fragmented genomic DNA.

5. The method of any one of 1 to 4, wherein the population of single cells comprises eukaryotic cells.

6. The method of 5, wherein the population of single cells comprises mammalian cells.

7. The method of any one of 1 to 4, wherein the population of single cells comprises bacterial cells.

8. The method of any one of 1 to 5, wherein the population of single cells comprises fungal cells.

9. The method of any one of 1 to 8, wherein the molten gel droplet comprises a hydrogel polymer.

10. The method of 9, wherein the hydrogel polymer comprises a thermoresponsive polymer.

11. The method of 10, wherein the thermoresponsive polymer is agarose.

12. The method of any one of 1 to 11, wherein the solidifying comprises cooling the population of molten gel droplets.

13. The method of any one of 1 to 9, wherein the molten gel droplet comprises polyethylene glycol (PEG).

14. The method of 13, wherein the solidifying comprises chemically crosslinking the PEG.

15. The method of 13, wherein the solidifying comprises photo-crosslinking the PEG.

16. The method of any one of 1 to 9, wherein the molten gel droplet comprises acrylamide.

17. The method of 16, wherein the solidifying comprises chemically crosslinking the acrylamide.

18. The method of 16, wherein the solidifying comprises photo-crosslinking the acrylamide.

19. The method of any one of 1 to 9, wherein the molten gel droplet comprises alginate.

20. The method of 19, wherein the solidifying comprises adding calcium to the molten gel droplet.

21. The method of any one of 1 to 20, wherein the solidified microgels comprise pores sized to retain genomic DNA within the solidified microgels.

22. The method of any one of 1 to 21, wherein the step of encapsulating the population of single cells in molten gel droplets comprises the addition of an oil.

23. The method of any one of 1 to 22, wherein the exposing comprises contacting the population of solidified microgels in bulk with a lytic enzyme to lyse cells contained within the population of solidified microgels.

24. The method of 23, wherein the lytic enzyme is selected from zymolyase, lysostaphin, mutanolysin, lysozyme, or a combination thereof.

25. The method of any one of 1 to 24, wherein the step of purifying genomic DNA from cells contained within the population of solidified microgels comprises contacting the population of solidified microgels with a detergent to solubilize cellular material contained within the population of solidified microgels.

26. The method of 25, wherein the detergent is selected from lithium dodecyl sulfate, sodium dodecyl sulfate, or a combination thereof.

27. The method of any one of 1 to 26, wherein the step of purifying genomic DNA from cells contained within the population of solidified microgels comprises contacting the population of solidified microgels with a protease to digest cellular proteins contained within the population of solidified microgels.

28. The method of 27, wherein the protease is proteinase K.

29. The method of any one of 1 to 28, wherein the step of purifying genomic DNA from cells contained within the population of solidified microgels comprises a step of washing the population of solidified microgels, wherein the step of washing the population of solidified microgels comprises contacting the population of solidified microgels with a washing buffer.

30. The method of any one of 1 to 29, wherein each of the population of purified genomic DNA-containing droplets comprise a complex comprising a transposase and a transposon.

31. The method of any one of 1 to 30, wherein the step of fragmenting comprises contacting the purified genomic DNA with a complex comprising a transposase and a transposon.

32. The method of 31, wherein the complex comprises a transposon that comprises an adapter sequence.

33. The method of 32, wherein contacting the purified nucleic acids with the complex provides fragmented genomic DNA comprising the adapter sequence.

34. The method of any one of 1 to 33, wherein the step of encapsulating a population of single cells in molten gel droplets and the step of barcoding the fragmented genomic DNA or an amplification product thereof are performed using a microfluidic device.

35. The method of any one of 1 to 34, wherein one or more of the steps of

-   -   solidifying the population of molten gel droplets to provide a         population of solidified microgel droplets;     -   breaking the emulsions of the solidified microgel droplets to         provide a population of solidified microgels;     -   exposing the population of solidified microgels in bulk to lysis         conditions sufficient to lyse cells contained within the         population of solidified microgels;     -   purifying genomic DNA from cells contained within the population         of solidified microgels in bulk to provide a population of         solidified microgels comprising purified genomic DNA; and     -   fragmenting the purified genomic DNA within the population of         solidified microgels comprising purified genomic DNA in bulk to         provide a population of solidified microgels comprising         fragmented genomic DNA, are not performed using a microfluidic         device.

36. The method of any one of 1 to 35, wherein the population of single cells is a heterogeneous population of single celled microorganisms.

37. A system comprising a microfluidic device, a molten gel reservoir and a heating element, the microfluidic device comprising a co-flow droplet maker comprising

-   -   a first input channel configured to provide a plurality of cells         to a flow channel,     -   a second input channel configured to provide a molten gel flow         to the flow channel from the gel reservoir, wherein the heating         element is positioned in proximity to the gel reservoir and         configured to apply heat to the gel reservoir sufficient to         maintain a molten gel in the molten gel reservoir in a molten         state, and     -   a third input channel and a fourth input channel positioned on         opposite sides of the flow channel and downstream of the first         and second input channels, wherein the third and fourth input         channels are configured to provide immiscible phase fluid flows         to the flow channel.

Set B

1. A method of sequencing single cell genomic DNA, the method comprising:

-   -   encapsulating a population of single cells in molten gel         droplets to provide a population of molten gel droplets, wherein         each molten gel droplet of the population contains zero or one         cell;     -   solidifying the population of molten gel droplets to provide a         population of solidified microgel droplets;     -   breaking the emulsions of the solidified microgel droplets to         provide a population of solidified microgels;     -   exposing the population of solidified microgels in bulk to lysis         conditions sufficient to lyse cells contained within the         population of solidified microgels;     -   purifying genomic DNA from cells contained within the population         of solidified microgels in bulk to provide a population of         solidified microgels comprising purified genomic DNA;     -   encapsulating the population of solidified microgels comprising         purified genomic DNA into droplets to provide a population of         purified genomic DNA-containing droplets;     -   barcoding the genomic DNA or one or more amplification products         thereof to provide a population of barcoded, genomic         DNA-containing droplets;     -   purifying barcoded, genomic DNA from the barcoded, genomic         DNA-containing droplets to provide purified, barcoded, genomic         DNA; and     -   sequencing the purified, barcoded, genomic DNA.

2. The method of 1, wherein the barcoding comprises merging each of the purified genomic DNA-containing droplets with a barcode containing droplet.

3. The method of 2, wherein each of the barcode containing droplets comprises a unique nucleic acid barcode sequence.

4. The method of any one of 1 to 3, comprising performing a MALBAC amplification reaction in the purified genomic DNA-containing droplets, e.g., as described herein with regard to molecular amplification and barcoding via MALBAC.

5. The method of any one of 1 to 4, wherein the population of single cells comprises eukaryotic cells.

6. The method of 5, wherein the population of single cells comprises mammalian cells.

7. The method of any one of 1 to 4, wherein the population of single cells comprises bacterial cells.

8. The method of any one of 1 to 5, wherein the population of single cells comprises fungal cells.

9. The method of any one of 1 to 8, wherein the molten gel droplet comprises a hydrogel polymer.

10. The method of 9, wherein the hydrogel polymer comprises a thermoresponsive polymer.

11. The method of 10, wherein the thermoresponsive polymer is agarose.

12. The method of any one of 1 to 11, wherein the solidifying comprises cooling the population of molten gel droplets.

13. The method of any one of 1 to 9, wherein the molten gel droplet comprises polyethylene glycol (PEG).

14. The method of 13, wherein the solidifying comprises chemically crosslinking the PEG.

15. The method of 13, wherein the solidifying comprises photo-crosslinking the PEG.

16. The method of any one of 1 to 9, wherein the molten gel droplet comprises acrylamide.

17. The method of 16, wherein the solidifying comprises chemically crosslinking the acrylamide.

18. The method of 16, wherein the solidifying comprises photo-crosslinking the acrylamide.

19. The method of any one of 1 to 9, wherein the molten gel droplet comprises alginate.

20. The method of 19, wherein the solidifying comprises adding calcium to the molten gel droplet.

21. The method of any one of 1 to 20, wherein the solidified microgels comprise pores sized to retain genomic DNA within the solidified microgels.

22. The method of any one of 1 to 21, wherein the step of encapsulating the population of single cells in molten gel droplets comprises the addition of an oil.

23. The method of any one of 1 to 22, wherein the exposing comprises contacting the population of solidified microgels in bulk with a lytic enzyme to lyse cells contained within the population of solidified microgels.

24. The method of 23, wherein the lytic enzyme is selected from zymolyase, lysostaphin, mutanolysin, lysozyme, or a combination thereof.

25. The method of any one of 1 to 24, wherein the step of purifying genomic DNA from cells contained within the population of solidified microgels comprises contacting the population of solidified microgels with a detergent to solubilize cellular material contained within the population of solidified microgels.

26. The method of 25, wherein the detergent is selected from lithium dodecyl sulfate, sodium dodecyl sulfate, or a combination thereof.

27. The method of any one of 1 to 26, wherein the step of purifying genomic DNA from cells contained within the population of solidified microgels comprises contacting the population of solidified microgels with a protease to digest cellular proteins contained within the population of solidified microgels.

28. The method of 27, wherein the protease is proteinase K.

29. The method of any one of 1 to 28, wherein the step of purifying genomic DNA from cells contained within the population of solidified microgels comprises a step of washing the population of solidified microgels, wherein the step of washing the population of solidified microgels comprises contacting the population of solidified microgels with a washing buffer.

30. The method of any one of 1 to 29, wherein the step of encapsulating a population of single cells in molten gel droplets and the step of barcoding the fragmented genomic DNA or an amplification product thereof are performed using a microfluidic device.

31. The method of any one of 1 to 30, wherein one or more of the steps of

-   -   solidifying the population of molten gel droplets to provide a         population of solidified microgel droplets;     -   breaking the emulsions of the solidified microgel droplets to         provide a population of solidified microgels;     -   exposing the population of solidified microgels in bulk to lysis         conditions sufficient to lyse cells contained within the         population of solidified microgels; and     -   purifying genomic DNA from cells contained within the population         of solidified microgels in bulk to provide a population of         solidified microgels comprising purified genomic DNA, are not         performed using a microfluidic device.

32. The method of any one of 1 to 31, wherein the population of single cells is a heterogeneous population of single celled microorganisms.

It will be apparent to one of ordinary skill in the art that various changes and modifications can be made without departing from the spirit or scope of the invention.

EXAMPLES

The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to make and use the present invention, and are not intended to limit the scope of the invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. Unless indicated otherwise, parts are parts by weight, molecular weight is weight average molecular weight, temperature is in degrees Centigrade, and pressure is at or near atmospheric.

All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference.

The present invention has been described in terms of particular embodiments found or proposed to comprise preferred modes for the practice of the invention. It will be appreciated by those of skill in the art that, in light of the present disclosure, numerous modifications and changes can be made in the particular embodiments exemplified without departing from the intended scope of the invention. All such modifications are intended to be included within the scope of the appended claims.

Example 1: Ultrahigh-Throughput Single Cell Genome Sequencing with Droplet Microfluidic Barcoding

The present disclosure provides ultrahigh-throughput single cell genomic sequencing (SiC-seq), a droplet microfluidic method capable of sequencing >50,000 single cell genomes per run. The method is validated by sequencing an artificial population of microbes containing known species at controlled proportions, obtaining ˜0.1% average coverage per cell, uniform genomic sampling, and accurate estimates of species proportion. Moreover, SiC-seq generates a metagenomic database in which reads are grouped by single cells. This database, in turn, enables a new kind of “in silico cytometry” similar to conventional flow cytometry, except that all sorting occurs computationally based on genomic sequence markers and these biomarkers need not be specified to collect data a priori. To demonstrate this, SiC-seq and in silico cytometry is applied to a sample of marine microbes, and was used to measure how antibiotic resistance genes, virulence factors, and phage-associated sequences are distributed throughout the population. The ability to repeatedly sort through a population of genomes without having to perform additional wet lab experiments allows rapid iteration through hypotheses and enhances what can be discovered by what is learned. It is valuable for generating correlation maps between characteristic sequences, to infer how different phenotypes are correlated within single cells, and how genetic elements spread through a community.

Materials and Methods:

Microfluidic Devices: To fabricate the microfluidic devices, poly(dimethylsiloxane) (Dow Corning, Sylgard 184) was poured over a negative photoresist (MicroChem, catalog no. SU-8 3025) patterned on a silicon wafer (University Wafer) using UV photolithography. The PDMS devices were cured in an oven for 1 hour, extracted with a metal scalpel, and punched with a 0.75 mm biopsy core (World Precision Instruments, catalog no. 504529) to create inlets and outlets. Devices were bonded to a glass slide using an oxygen plasma cleaner (Harrick Plasma) and the channels treated with Aquapel (PPG Industries) and baked at 80° C. for 10 min to render them hydrophobic.

Barcode Emulsions: Barcode emulsions were prepared through digital PCR process wherein barcode oligonucleotides were amplified as single molecules in droplets containing PCR reagents. Barcode oligonucleotides (GCAGCTGGCGTAATAGCGAGTACAATCTGCTCTGATGCCGCATAGNNNNNNNNN NNNNNNTAAGCCAGCCCCGACACT) (SEQ ID NO:5) (IDT) at 0.01 pM concentration were added to a PCR reaction mix containing 1× NEB Phusion Hot Start Flex Master Mix (NEB, catalog no. M0536L), 2% (w/v) Tween 20 (Sigma-Aldrich, catalog no. P9416), 5% (w/v) PEG-6000 (Santa Cruz Biotechnology, catalog no. sc-302016), and 400 nM primers

FL128 (SEQ ID NO: 6) (CTGTCTCTTATACACATCTCCGAGCCCACGAGACGTGTCGGGGC TGGCTTA) and FL129 (SEQ ID NO: 7) (CAAGCAGAAGACGGCATACGAGATCAGCTGGCGTAATAGCG, contains P7 adapter sequence) (IDT). The PCR mixture and HFE-7500 fluorinated oil (3M) with 2% (w/w) PEG-PFPE amphiphilic block copolymer surfactant (008-Fluoro-surfactant, Ran Technologies) were loaded into separate 1 mL syringes (BD) and injected at 300 and 500 μL/hr, respectively, into a flow-focusing droplet maker using syringe pumps (New Era, catalog no. NE-501) controlled with a custom Python script (“https:” followed by “//github.” followed by “com/AbateLab/Pump-Control-Program”). The emulsion was collected in PCR tubes, and the oil underneath the emulsion removed via pipette and replaced with FC-40 fluorinated oil (Sigma-Aldrich, catalog no. 51142-49-5) with 5% (w/w) PEG-PFPE amphiphilic block copolymer surfactant for improved thermal stability. The emulsion was thermal cycled (Bio-Rad, T100) with the following program: 98° C. for 3 min, followed by 40 cycles with 2° C. per second ramp rates of 98° C. for 10 s, 62° C. for 20 s, and 72° C. for 20 s, followed by a hold at 12° C. Fluorescent DNA staining using 10× SYBR Green I (Thermo Fisher Scientific) in HFE-7500 oil was used to quantify barcode encapsulation rate under a fluorescent microscope (Life Technologies, catalog no. AMAFD1000).

Cell Culture and Counting: To generate an artificial community with which to validate the SiC-seq workflow, liquid cultures of Staphylococcus epidermidis, Saccharomyces cerevisiae (strain S288c), and Bacillus subtilis (strain 168) were grown overnight in a shaking incubator. The following culture conditions were used: Staphylococcus epidermidis and Bacillus subtilis are grown in 3 mL LB broth at 37° C.; Saccharomyces cerevisiae was grown in 3 mL YPD broth at 30° C. Cell concentration was determined by manually counting serial dilutions of the liquid culture on plastic slides (Thermo Fisher Scientific, catalog no. C10228) using a microscope. The cultures were kept at 4° C. before being used in the microfluidic experiment (see section titled Cell Encapsulation in Agarose droplets).

Water Sample Collection and Filtering: To obtain a natural sample of a microbial community, marine water was collected from Ocean Beach in West San Francisco, Calif., USA (37°44′55.6″N 122°30′33.6″W). Approximately 2 L of water was obtained by submerging two 1000 mL glass bottles below the water surface ˜20 m from the shoreline. Samples were placed on ice during transport to the lab. 100 mL of the sample was passed through a 40 μm cell strainer (Corning, product no. 352340) to remove large debris, including sand. The sample is loaded into a 0.45 μm vacuum filter (Millipore, catalog no. SCHVU01RE); this filtering step separates microbes, which are captured on the membrane, and viruses, which are discarded in the filtrate. The membrane was extracted from the apparatus using a scalpel and inserted into a 15 mL centrifuge tube, to which 5 mL of PBS was added. The tube was vortexed at high speed for ˜2 min to free the bacterial cells from the membrane. Finally, the cell solution was loaded into a 10 mL syringe and passed through a 5 μm syringe filter (Millipore, catalog no. SLSV025LS) to remove remaining large particulate. The marine cells were counted using the same protocol as the liquid cultures.

Cell Encapsulation in Agarose Microgels: To prepare the artificial community for processing through the SiC-seq workflow, the frozen stock of cells (Zymo Research, catalog no. D6300) were thawed gently in a room-temperature water bath. Cell concentration was determined by manual cell counting under a microscope, and diluted to an appropriate concentration for single cell encapsulation. The calculated volume of cell solution was transferred to a 1.5 mL centrifuge tube (Fisher Scientific) and washed twice in 1 mL PBS. The cells were re-suspended in a 1 mL solution of PBS containing 17% OptiPrep Density Gradient Medium (Sigma-Aldrich), 0.1 mg/mL BSA (Sigma-Aldrich, catalog no. A9418), and 1% (v/v) Pluronic F-68 (Life Technologies). The cell solution was loaded into a 1 mL syringe and placed on a syringe pump (New Era, catalog no. NE-501). 1 mL of a 3% solution of low gelling temperature agarose (Sigma-Aldrich, catalog no. A9414) and TE buffer (Teknova, catalog no. T0225) was prepared in a 1.5 mL centrifuge tube and heated on a block at 90° C. for approximately 10 minutes to completely dissolve the agarose powder. The hot agarose was transferred to a 1 mL syringe and placed on a syringe pump. To keep the agarose molten during the microfluidic experiment, a personal space heater was positioned ˜5 cm from the agarose syringe and set to run continuously at high heat. HFE-7500 fluorinated oil with 2% (w/w) de-protonated Krytox surfactant (DuPont, catalog no. 157FSH) was loaded into a 3 mL syringe. The cell solution, molten agarose, and oil were injected into the co-flow droplet maker at flow rates of 200, 200, and 400 μL/hr, respectively, to form the 1.5% agarose microgels. Approximately 500 μL of droplets were collected in a 15 mL centrifuge tube on ice and incubated for 30 min at 4° C. to ensure complete solidification of the microgels.

Resuspending Microgels in Aqueous Buffer: The droplets were centrifuged at 300 g for 1 min to maximize separation of the emulsions from the oil. The oil layer was extracted from the tube using a 5 mL syringe and discarded. Emulsions were broken using 2 mL of a 10% (v/v) solution of perfluorooctanoyl (Sigma-Aldrich, catalog no. 370533) in HFE-7500; the emulsions were then mixed by pipetting and centrifuged at 300 g for 1 min. The oil was removed from the tube using a syringe and the droplet breaking step is repeated. Following droplet breaking, 2 mL of hexane containing 1% (v/v) Span 80 (Sigma-Aldrich) was added to the microgels to dissolve any remaining oil, and this solution was mixed and centrifuged at 300 g for 1 min. The hexane supernatant was removed from the tube and the hexane addition step was repeated. Finally, the microgels were washed three times in 10 mL of aqueous solution TE buffer containing 0.1% (v/v) Triton X-100 nonionic surfactant (Sigma-Aldrich). The microgels were centrifuged at 1000 g for 2 min and the supernatant aspirated between washes. The washed microgels were stored in 5 mL TE buffer at 4° C. prior to cell lysis.

Cell Lysis in Microgels: To lyse the cells in the microgels, the particles were submerged in a solution of 2 mL TE buffer solution containing 10 mM DTT (manu), 2.5 mM EDTA (Teknova), and 10 mM NaCl (Sigma-Aldrich). The following quantities of lytic enzymes were also included: 4 U zymolyase (Zymo Research), 10 U lysostaphin (Sigma-Aldrich, catalog no. L7386), 100 U mutanolysin (Sigma-Aldrich, catalog no. M9901), and 40 mg lysozyme (MP Biomedicals, catalog no. 195303). Cell lysis proceeded overnight in a shaking incubator at 37° C. The turbid lysate mixture was centrifuged at 1000 g for 1 min, the supernatant removed, and 3 mL of a solution containing 0.5% (w/v) lithium dodecyl sulfate (Sigma-Aldrich) and 10 mM EDTA in TE buffer was added, along with 4 U of Proteinase K (NEB) to solubilize cell debris and digest cellular proteins. The solution was incubated at 50° C. on a heating block for 30 min. Following lysis, the microgels were thoroughly washed to ensure complete removal of detergents and other chemical species which may inhibit downstream molecular biology reactions. The following washes occurred in 10 mL volumes with centrifugation magnitudes of 1000 g between additions of wash solutions: one wash with 2% (v/v) Tween 20 in water; one wash in 100% ethanol (Koptec) to denature any remaining Proteinase K; and five washes with 0.02% (v/v) Tween 20 in water.

Tagmentation of Genomic DNA in Microgels: Using reagents from a Nextera DNA Library Prep Kit (Illumina, catalog no. FC-121-1030), the washed and lysed gels containing high-molecular-weight genomic DNA were simultaneously fragmented and tagged with a common adapter sequence. Microgels were re-encapsulated into droplets to minimize cross-contamination during the tagmentation step. A solution of 192 μL DI water, 200 μL tagmentation buffer, and 8 μL Nextera enzyme was prepared and loaded into a 1 mL syringe. Microgels and the tagmentation solution were injected into the re-encapsulation device (FIG. 1, panel B). The re-encapsulated microgels were incubated in a 1.5 mL tube on a heating block at 50° C. for one hour.

Microfluidic Barcoding of Encapsulated Cells: Tagmented microgels, barcode droplets, and 500 μL of PCR solution containing 1× Invitrogen Platinum Multiplex PCR Master Mix (Thermo Fisher Scientific, catalog no. 4464268), 400 nM primers FL127 (AATGATACGGCGACCACCGAGATCTACACTCGTCGGCAGCGTC (SEQ ID NO:8), contains P5 adapter sequence) and FL129 (CAAGCAGAAGACGGCATACGAGATCAGCTGGCGTAATAGCG) (SEQ ID NO:7), 50× dilution of NT buffer from the Nextera XT Kit (0.2% SDS) (Illumina, catalog no. FC-131-1024), 1% (w/v) Tween 20, 1% (w/v) PEG-6000, 2.5 U/μL Bst 2.0 WarmStart DNA Polymerase (NEB, catalog no. M0538S) were each loaded into a 1 mL syringe and injected into the merger device as shown in FIG. 1. HFE-7500 fluorinated oil with 2% (w/w) 008-Fluorosurfactant was used as the continuous phase of the emulsion. Merger of the barcode and gel droplet emulsions was achieved using an electrode connected to a cold cathode fluorescent inverter and DC power supply (Mastech). A voltage of 2.0 V at the power supply produced a ˜2 kV AC potential at the electrode which causes touching droplets to merge. The emulsion was collected in a 0.5 mL thin-walled PCR tube (Applied Biosciences), and the HFE-7500 replaced with FC-40 with 5% (w/w) 008-Fluorosurfactant prior to thermal cycling with the following protocol: 65° C. for 5 mins, 95° C. for 2 mins, then 30 cycles at 2° C./s ramp rates of 95° C. for 15 s, 60° C. for 1 min, 72° C. for 1 min, and then 72° C. for 5 mins with optional 12° C. overnight hold. After thermal cycling, large (coalesced) droplets were removed using a micropipette, and the emulsion was broken by addition of 20 μL of perfluorooctanoyl and brief centrifugation in a micro-centrifuge. The upper aqueous phase was collected and the DNA library was purified using a Zymo DNA Clean & Concentrator-5 kit (Zymo Research). The library was size-selected for DNA fragments in the 200-600 bp range using Agencourt AMPure XP beads (Beckman Coulter), quantified with a Bioanalyzer 2100 instrument and High Sensitivity DNA chip (Agilent), and sequenced on an Illumina MiSeq using a custom index primer (FL166).

Generating the SiC-Reads Database: Raw reads from the MiSeq-generated FASTQ files were filtered by quality and grouped by barcode sequence using the Python script barcodeCleanup.py. A given read was discarded if more than 20% of its bases had a Q-score less than Q20, and all reads associated with a barcode containing less than 50 reads were discarded. This step ensured that all barcode groups, representing single cells, contain a sufficient number of high-quality reads. The resulting reads were exported to a table in a SQLite database with fields containing the barcode sequence, barcode group size, a unique read ID number, and read sequence. When the reference genomes were known, as in the case of the synthetic cell population experiment, the reads were aligned using bowtie2 v2.2.9 with default settings and the SQLite table was updated with relevant alignment information for each read. For environmental samples, the reads were classified by taxonomy using Kraken v0.10.5 with “—quick—min-hits 2” options set, and the output was exported to the SQLite database. krakenAnalysis.py assigns taxonomic identities from the Kraken database to barcode groups by a majority rule, in which barcode group was classified according to the most common taxonomic label among its classifiable reads. Barcode group purity was calculated from reference alignment data or phylogenetic labels using the script purity.py.

in silico Cytometry: Reads from the SiC-Reads database were aligned, using bowtie2 v2.2.9 with—very-sensitive and—end-to-end settings to reference sequences of interest (AR database obtained from Gupta, S. K., et al., Antimicrob. Agents Chemother., 58:212-220 (2014), VF database obtained from core VF genes at the virulence factor database (VI-DB; Chen, L. et al., Nucleic Acids Res., 33:D325-328 (2005), Phage sequence database obtained from Phage genome database accessed on May 2016 at “http:” followed by “//www.ebi.ac.” followed by “uk/genomes/phage.html”. Mapping reads were then filtered for MapQ>2 in order to remove ambiguously mapping reads. Barcode groups containing reads that map to the databases were annotated as containing the target sequence and were exported for further analysis if they were taxonomically classified with purity >0.8. To generate the heatmap for transduction potential, all reads that associated with a phage and a Kraken-classified barcode group were extracted and grouped according to phage type. Duplicate and near-duplicated reads were removed. The heatmap intensities were calculated as follows: for a given pair of bacterial hosts, the total number of host-phage-host connections in the database was counted. To normalize the data by host abundance, this number was divided by the total number of barcode groups associated with the two hosts.

Generating the Antibiotic Resistance Network for the Sequenced Whole Genomes as Depicted in FIG. 10A: An antibiotic resistance graph in FIG. 10A was generated using references for the 6 genomes most commonly associated with antibiotic resistance in the SiC-Reads database of the San Francisco Coast water microbial community. The following genomes (with accession numbers) were downloaded from the NCBI RefSeq repository: >CP003841.1|Alteromonas macleodii ATCC 27126; >CP010434.1|Bacillus subtilis subsp. spizizenii strain NRS 231; >CP000884.1|Delftia acidovorans SPH-1; >CP001918.1|Enterobacter cloacae subsp. cloacae ATCC 13047; >AE002098.2|Neisseria meningitidis MC58; and >AE017283.1|Propionibacterium acnes KPA171202. These genomes were combined into a single FASTA file and passed to a short read simulator, wgsim v0.3.2, which generated 10M single-end reads of 70 bp each with a base error rate of 0. These reads were aligned to the antibiotic resistant gene reference using bowtie2 in ‘local’ mode with default sensitivity settings. All unaligned sequences were removed using samtools (samtools view -b -F). The aligned sequences in .SAM format were imported into Cytoscape v3.4.0 and the network shown in FIG. 10A was generated using the reference genus and antibiotic resistance genes as the network targets and sources, respectively. The darkness of the graph's edges scale linearly with the total number of connections in the data, where darker lines have a greater number of associations.

Generating the VF Ratios for the Sequenced Whole Genomes in FIG. 10B: The VF ratios calculations in FIG. 9B was reproduced using reference genomes for the genera shown in the figure. The complete genomes of all species associated with these 12 genera were downloaded from the RefSeq database using the Perl script ncbiDownloader.pl. Genomes were pooled into FASTA files labeled by genus. From these reference files, a Python script (bargroupGenerator.py) generated simulated barcode groups of 200 reads per group, with each single-end read 150 bp long. The number of simulated bargroups generated for a given genus was equal to the number of barcode groups identified for this genus in the San Francisco Coast water sample. The simulated barcode group reads were then aligned to the original virulence factor database using bowtie2 v2.2.9 in ‘local’ alignment mode with default sensitivity settings. Unaligned sequences were removed using samtools v1.3.1 (samtools view -b -F) and the remaining aligned reads were used to produce the data shown in FIG. 10B.

Results:

SiC-Seq Workflow: Droplet microfluidics, with its ability to encapsulate and perform biological reactions on thousands of single cells per second, affords unparalleled potential for single molecule and single cell applications, including to uniformly amplify, accurately quantitate, and deeply sequence single molecules, and to culture, screen, and sequence the transcriptomes of single cells. However, single cell genome sequencing presents the unique challenge that each cell's genome is protected by membranes and proteins that may need to be removed before enzymatic processing is possible. The reagents often utilized in genome purification, including detergents, proteases, and high pH buffers, however, may be detrimental to enzymes used in preparation for sequencing, requiring that these steps be performed separately. In SiC-seq, this was addressed by encasing cells in hydrogel microspheres (microgels) that are permeable to molecules with hydraulic diameters smaller than the pore size, including enzymes, detergents, and small molecules, but sterically trap large molecules such as genomes. This allows for the use of a series of “washes” on millions of encased cells, to perform the steps of cell lysis and genome processing, while maintaining compartmentalization of each genome. Using a combination of microgel and microfluidic processing steps, the cells are lysed, genomes are fragmented, and unique barcodes are attached to all fragments, in a workflow that processes >50,000 cells in a few hours. The barcoded fragments for all cells can then be pooled and sequenced, and the reads grouped by barcode, providing a library of single cell genomes that can be subjected to additional downstream processing, including demographic characterization and in silico cytometry. A diagram of the workflow for SiC-seq is provided in FIG. 2.

An aspect of SiC-seq that facilitates ultrahigh-throughput processing and sequencing of single cells is the labeling of DNA fragments originating from the same genome with a sequence identifier (barcode) unique to that cell. The resultant products are chimeric, including a barcode sequence covalently linked to a random fragment of the cell genome. The barcodes allow all reads belonging to a given cell to be identified through shared sequence. To uniquely barcode the genomic fragments of many single cells, a library of unique barcode sequences is utilized. Recently published methods to barcode single cell transcriptomes introduce barcodes attached to solid beads or hydrogel spheres (Klein, A. M., et al., Cell, 161:1187-1201 (2015); Macosko, E. Z., et al., Cell, 161:1202-1214 (2015), the disclosures of each of which are incorporated by reference herein). Such methods may be utilized in connection with the methods described herein. However, in the embodiment of SiC-seq exemplified herein, liquid droplets containing the barcode sequences are merged with the genomes to be barcoded (See, e.g., Lan, F., et al., Nat. Commun., 7:11784 (2016), the disclosure of which is incorporated by reference herein).

To prepare a barcode droplet library, oligonucleotides including 15 random bases flanked by constant sequences were encapsulated at a rate of 0.1 using microfluidic flow focusing (FIG. 1, Panel A and FIG. 3A) (Garstecki, P., et al., Appl. Phys. Lett., 85:2649-2651 (2004), the disclosure of which is incorporated by reference herein). A single barcode molecule, however, is generally insufficient to label a cell's genome, and accordingly the barcode molecule is generally amplified prior to the barcoding process. To accomplish this, droplets are generated using PCR reagents and primers complementary to the constant regions of the barcodes and which contain the Illumina P7 flow cell adapter. The droplets are then thermal cycled to amplify the barcode sequences via digital droplet PCR. This approach generated ˜10 million barcode droplets in a few hours in an efficient manner.

Before the single cell genomes can be barcoded, they are generally physically isolated and purified from the cell body and fragmented. To accomplish this, single cells were encased in agarose microgels using a two-stream co-flow droplet maker, which merged a cell suspension stream with a molten agarose stream, forming a droplet consisting of an equal volume of both streams (FIG. 3B and FIG. 1, Panel A). The droplet maker ran at ˜10 kHz, which allowed for the generation of 10 million 22 μm droplets in ˜20 minutes, a total volume of aqueous emulsion fraction ˜60 μL. Hence, droplet generation was fast and the total volume consumed small, allowing for the loading of cells at a rate of 1:10 to minimize multi-cell encapsulation; this reduced the likelihood that coalescence during thermal cycling mixed droplets containing different genomes, which would yield undesirable non-single cell barcode groups.

The molten agarose droplets were collected into PCR tubes on ice, solidifying them. The solidified microgels were then transferred from oil to water, while maintaining encapsulation of the cells, which were then subjected to lysis and genome purification. To lyse the cells, the solidified microgels were incubated overnight in a mixture of lytic enzymes, digesting the protective microbial cell walls (see, Materials and Methods). They were then incubated in a mixture of detergents and proteases for 30 minutes, solubilizing lipids and digesting proteins, preserving only high molecular weight genomic DNA, which was verified by staining with SYBR green dye. To fragment the genomes and attach the universal sequences to act as PCR handles, the solidified microgels were re-encapsulated in the Nextera® reaction (FIG. 1, Panel B). Importantly, because the transposases are dimeric, the fragmented genome remains intact as a macromolecular complex, remaining sterically encased within the hydrogel network (FIG. 4).

After the genomes are purified and fragmented, they are barcoded for sequencing. A microfluidic device that encapsulates each solidified microgel in PCR reagents is used, the device then merges the solidified microgel with a barcode droplet (FIG. 3C and FIG. 1, panel C). Monodisperse microgels have the unique and valuable property that, because they are compliant, they can flow at high volume fraction (>0.65) through microfluidic channels without clogging, causing them to order and flow periodically into a droplet generator. By matching the droplet period with the microgel injection period, it is possible to achieve efficient loading of microgels in droplets, a technique that has been exploited for human genome haplotyping and single cell transcriptome sequencing with minimal cell loss and is a part of commercialized droplet microfluidic instrumentation. The droplets containing fragmented-genome and barcode are collected into a PCR tube and thermal cycled, splicing the barcode sequences onto the genomic fragments via complementarity through the PCR handles added by the transposase. At this point, the spliced fragments contain both the P5 and P7 Illumina sequencing adaptor required for sequencing on the Illumina platforms. During thermal cycling, some droplets coalesce, generating barcode clusters corresponding to multiple cells. These coalesced droplets were removed using a micropipette at the end of thermal cycling, and then the purified droplets were chemically ruptured, and their contents pooled and prepared for sequencing (see, Materials and Methods). After sequencing, the reads were filtered by quality and grouped by barcode, providing single cell genomic sequence data.

Validation of SiC-Seq on an Artificial Microbial Community: The objective of SiC-seq is to provide single cell genomic sequences bundled in barcode groups; this data can then be used for microbial demographic characterization, to correlate interesting sequences within the same genome, and as potential scaffolds for genome assembly. To validate that SiC-seq generated single cell barcode groups, it was applied to an artificial community containing five Gram-positive bacteria, three Gram-negative bacteria, and two yeasts mixed in equal proportion by genomic DNA content (Table 1, below) To confirm that the lysis procedures were reasonably general, this mixture represented gram-positive bacteria and fungi, which are typically difficult to lyse. A single-cell library was prepared from this community using SiC-seq and it was sequenced on an Illumina MiSeq, yielding ˜6 million paired-end reads of 150 bp after quality filtering. Reads were grouped by barcode and groups with <50 reads were discarded, representing likely PCR-mutated barcode sequences, and yielded the final 48,989 barcode groups (FIG. 5A). Each barcode group theoretically represents a low-coverage genome of a cell, with an average depth of coverage of 1% and a distribution that is similar for all microbes (FIG. 11).

TABLE 1 Listing of the ten cell types in artificial community Organism GC % Gram Bacillus subtilis 43.8 Positive Cryptococcus neoformans 48.2 N/A (Yeast) Enterococcus faecalis 37.5 Positive Escherichia coli 56.8 Negative Lactobacillus fermentum 52.8 Positive Listeria monocytogenes 38.0 Positive Pseudomonas aeruginosa 66.2 Negative Saccharomyces cerevisiae 38.4 N/A (Yeast) Salmonella enterica 52.2 Negative Staphylococcus aureus 32.7 Positive

To determine whether the barcode groups indeed correspond to single cells, all reads were mapped to the reference genomes of the three known species. If two microbes reside within the same barcode group, reads will map to both genomes. A group purity score was defined as the fraction mapping to the most mapped reference. The distribution of group purity scores was strongly skewed to high values with the majority of purity score over 95% suggesting that most barcode groups represent single cells; this result is consistent even taking into account the different genome sizes of the ten species (FIG. 5B and FIG. 12) as well as when purity is examined for each individual species (FIG. 13). The rare barcode groups with low (<80%) purity scores were further examined and determined that the majority of those barcode groups represent rare cases where two cells were encapsulated into one droplet, and the occasional coalescence of two single-cell containing droplets (FIG. 14).

To determine whether SiC-seq barcodes abundances reflect the organism abundances in the dataset, abundance estimates calculated via short-read alignment, metagenomic sequence classification, and counting under bright-field microscopy were compared (FIG. 5C and FIG. 15). It was found that all methods are in reasonable agreement when reads are pooled and analyzed in bulk using these methods and also when species identities are assigned to each barcode based on the most commonly mapped species in a group. This demonstrates that SiC-seq enables estimation of species abundance in a microbial population consistent with accepted metagenomic methods.

To investigate coverage distribution bias in SiC-seq, the normalized coverage distribution was plotted for reads aggregated from all barcode groups for each microbe (FIG. 5D, FIG. 5E and FIG. 16). With the exception of coverage gaps due to differences between the genomic DNA abundances of cells within the standard microbial community, no significant coverage bias was observed. This indicated that the sampling of each genome within a barcode group was random, so that when all groups were overlaid a uniform distribution was obtained. The distribution of reads in individual barcode groups were further inspected and no significant bias was found (FIG. 6A, FIG. 6B) In addition, bias was minimal because each genome was amplified in a tiny volume of ˜65 pL, which has been shown to curtail bias-inducing runaway of exponential amplification. Moreover, the sequencing library was composed of ˜50,000 amplified genomes and, as such, the amplification of each genome can be limited by the tiny volume while still producing sufficient total DNA for sequencing.

SiC-Seq Generates a Novel Type of Data which can be Analyzed Using in silico Cytometry: The genomic sequences generated using SiC-seq were grouped by according to single cells, which was complementary to the sequences of short-read metagenomic sequencing. Existing computational tools were ill-suited to analyzing this data, because they do not exploit the single cell barcode information unique to SiC-seq. To address this, a novel sequence analysis pipeline was utilized in which reads are organized hierarchically as barcode groups, generating a Single Cell Reads database (SiC-Reads) (FIG. 7). To build SiC-Reads, raw sequences were filtered by quality, grouped by barcode, and a taxonomic classification of each group was estimated using phylogenetic profilers. A purity score was also estimated based on the reads classifiable by the profiler, assigning a value equal to the fraction of reads mapping to the dominant taxon within the classifiable set. Additional properties of barcode groups and reads, such as presence of sequences corresponding to antibiotic resistance genes, can be added to the database as they are discovered during analysis.

The massive set of single cell genomes present in SiC-Reads provides new opportunities for discovering associations between sequences within single cells, in a process called in silico cytometry. SiC-Reads include a multidimensional collection of single cell genomes that can be sorted in silico, in analogy to what is commonly done with flow cytometry on single cells. While flow cytometry requires that a target biomarker be selected a priori and is limited in the number of biomarkers that can be used, in silico cytometry can be performed as many times and with as many sequence biomarkers as desired. The database can be sorted repeatedly to mine for correlations between different genetic sequences and structures. Moreover, as new associations are learned, new sorting parameters can be defined, enabling new discoveries without having to repeat the experiment, ultimately limited only by the completeness and accuracy of the single cell database.

To illustrate the power of SiC-seq and in silico cytometry, a microbial community recovered from coastal sea water of San Francisco was sequenced (see, Materials and Methods). ˜8 million reads of 150 bp length was obtained after quality filtering (representing of ˜55% of raw reads, with which a SiC-Reads database was generated (FIG. 7). 601,348 (6.89%) of reads were classified into taxa representing 99.8% bacteria, 0.04% archaea, and 0.16% viruses (FIG. 8, Panel A). Barcode groups were assigned a taxonomic classification based on the reads they contained, following the rule that more than 10% of reads must have a classification, and the group is classified according to the taxon with the most supporting reads. Most barcode groups were estimated to be high purity based on the classifiable sequences (˜91%), in accordance with the control sample (˜94%) (FIG. 8, Panel B). Using this data, in silico cytometry was demonstrated by exploring the distribution of antibiotic resistance, virulence factors, and phage sequences in the microbial community.

Taxonomic Distribution of Antibiotic Resistance in Microbes Inhabiting the San Francisco Coastline: Antibiotic resistance (AR) has become increasingly common and represents a significant threat to global human health. Because antibiotics are the primary tool for fighting bacterial infections, understanding how AR genes spread in the natural environment is essential to maintaining effective counter-measures for bacterial diseases. Microbes can gain AR through numerous mechanisms, including mutation, acquisition of resistance-conferring genes, or even deletion of genes. While AR genes can be identified in most environments by short-read sequencing, scant information on how they are distributed among taxa is available, because obtaining this information usually requires testing or whole genome sequencing of single species; however, most species are uncultivable, precluding such analyses.

SiC-seq provides a unique opportunity to characterize the distribution of AR genes amongst all species in a sample, including uncultivable ones because species can be classified based on reads in the barcode group, and then associated with AR genes also present. To determine the distribution of AR genes among taxa in the dataset, the SiC-Reads database was searched for known AR genes, and 1,081(0.012% of reads) were found, representing 108 (0.30%) of barcode groups. The taxonomic distribution of AR genes had a clear structure (FIG. 9A and FIG. 10, Panel A); differences are expected in the natural coastline environment compared to the environment of the isolated and sequenced strains. The most abundant taxa associated with AR were not the most abundant community members overall, suggesting that in this community certain taxa tend to associate more with AR genes. For example, Aminoglycoside resistance was primarily found in Alteromonas spp., while Beta lactam resistance was widely spread, found in 4 out of 5 taxa. While not intending to be bound by any particular theory, it is possible that the broad-spectrum activity of Beta-lactams has encouraged their heavy use by humans and, correspondingly, has resulted in widespread resistance in the costal microbes of San Francisco. Aminoglycoside antibiotics, on the other hand, are not commonly used by humans and, thus, resistance against them is rare, with identified instances possibly representing genes naturally found in Alteromonas where its primary purpose is not to avoid Aminoglycoside antibiotics mediated killing.

Associating Virulence Factors with Host Bacteria in a Microbial Community: Virulence factors (VFs), like AR genes, are important in determining the threat that specific microbes pose to human health. Many opportunistic pathogens reside in natural communities in the environment and cause outbreaks when transmitted to a suitable host. Therefore, monitoring and detecting potentially pathogenic microbes is important for public health. While metagenomics shotgun sequencing or DNA microarray methods can detect the abundance of VFs in a community, they cannot determine which microbes carry them, or whether multiple VFs are present in the same microbes—both of which are important determinants for the pathogenic potential of a species. Here, again, SiC-seq affords a unique opportunity to characterize VFs in a community and to associate them with specific host species.

The coastal microbial community database was searched for known VF genes, and yielded matches in 1,949 (0.022%) reads in 101 (0.28%) barcode groups consisting of 29 prevalent VFs distributed among 13 microbial genera. The abundances of taxa with VFs did not reflect that of the total population, suggesting that certain genera tend to carry more VFs than others. To quantify this, the VF ratio was calculated, the ratio between the number of barcode groups containing VFs and the total number of barcodes in the community for that species, and the results were normalized to the highest VF ratio for comparison (FIG. 9B). Haemophilus and Escherichia stood out amongst all species, both of which are known opportunistic human pathogens. Upon closer inspection, the main VFs detected in Haemophilus were lipooligosaccharides, which are the major constituents of Haemophilus outer membranes and an important determinant of host immune evasion. In Escherichia, the main VFs detected are the K1 capsule and Type III secretion system, both commonly present in virulent strains. Comparing the VF ratios of the San Francisco coastline community with ones calculated for publically-available whole genomes, and down sampled to match the per-cell read depth (FIG. 10, Panel B), the ratios were found to be higher for the public genomes, indicating a bias towards pathogenic strains in currently-sequenced genomes.

Determining Transduction Potential Between Bacteria in a Microbial Community: Many virulent bacterial strains are thought to arise from horizontal gene transfer aided by cross-infection of bacteriophages. Phages can modify the genomes of their hosts, leaving a copy of their own genome behind, or transporting fragments of one species to another in a process thought to be an important mechanism for generating virulent bacterial strains, known as transduction. Nevertheless, as with AR and VF genes, characterizing the distribution of these mobile elements is challenging in an ecological context because confident identification of foreign genomic fragments within a specific host requires sequencing cultures of single species or single cells. Nevertheless, this information is extremely valuable for understanding how bacteria transfer genetic material in general, and how virulent new strains may emerge via this mechanism in particular.

To explore transduction in the microbial community, the SiC-Reads database of the San Francisco coastal community was searched for barcode groups containing phage sequences. A phage sequence found in a bacterial genome is evidence that it could potentially infect the host, an association that is normally extremely difficult to make for uncultivable microbes and their likely uncultivable infecting phages. Matches were found in 6,805 (0.078%) reads representing 260 (0.72%) barcode groups and 106 phage genomes. Since transduction can occur between two host cells infectable by the same phage, the potential for transduction depends on how many types of phages infect both hosts, and how often these phages infect both taxa. To visualize this, the sum of the number of times the sequences matching to the same phage in two bacterial taxa was detected were plotted, normalized by the number of barcode groups in those genera (FIG. 9C). According to this analysis, Delftia and Neisseria, which are closest related out of the taxa in this analysis, have the highest potential for transduction. Nevertheless, SiC-seq's ability to detect these sequences and correlate them within single genomes provides a novel and powerful approach for studying phage-host interactions. Such interactions are understudied due to the lack of high throughput methods with which to sequence single cells but critical to understanding microbial community dynamics.

Example 2: Microfluidic Barcoding of Single-Cells by MALBAC Reaction

In this example, a synthetic community of cells with known composition is used to evaluate amplification and barcoding performance. Cells encapsulated in hydrogels may be used in the place of the cell suspension described in this procedure.

Materials and Methods:

Microfluidic Devices: To fabricate the microfluidic devices, poly(dimethylsiloxane) (Dow Corning, Sylgard 184) was poured over a negative photoresist (MicroChem, catalog no. SU-8 3025) patterned on a silicon wafer (University Wafer) using UV photolithography. The PDMS devices were cured in an oven for 1 hour, extracted with a metal scalpel, and punched with a 0.75 mm biopsy core (World Precision Instruments, catalog no. 504529) to create inlets and outlets. Devices were bonded to a glass slide using an oxygen plasma cleaner (Harrick Plasma) and the channels treated with Aquapel (PPG Industries) and baked at 80° C. for 10 min to render them hydrophobic.

Barcode Emulsions: Barcode emulsions were prepared through asymmetric digital PCR process wherein barcode oligonucleotides were amplified as single molecules in droplets containing PCR reagents. The asymmetric PCR favors production of single-stranded barcode products. Barcode oligonucleotide BD23 (GTATGCGACTTCAGTACATCGTCCCGATGCTCTGCACAGTCGACCGCTGANNNN NNNNNNNNNNNCAGCGATCCCGAGTCAGATCGATGCCACTGAGACCTGTGAGTG ATGGTTGAGGTAGTGTGGAG) (SEQ ID NO:9) (IDT) at 0.01 pM concentration was added to a PCR reaction mix containing 1× NEB Phusion Hot Start Flex Master Mix (NEB, catalog no. M0536L), 2% (v/v) Tween 20 (Sigma-Aldrich, catalog no. P9416), 5% (w/v) PEG-6000 (Santa Cruz Biotechnology, catalog no. sc-302016), 0.2 μM limiting primer BD27 (CTCCACACTACCTCAACCATCACTCACAGGTCTCAGTGGC) (SEQ ID NO:10), and 1 μM excess primer BD28 (GTATGCGACTTCAGTACATCGTCCCGATGCTCTGCACA) (SEQ ID NO:11) (IDT). The PCR mixture and HFE-7500 fluorinated oil (3M) with 2% (w/w) PEG-PFPE amphiphilic block copolymer surfactant (008-Fluoro-surfactant, Ran Technologies) were loaded into separate 1 mL syringes (BD) and injected at 300 and 500 μL/hr, respectively, into a co-flow droplet maker (FIG. 21) using syringe pumps (New Era, catalog no. NE-501) controlled with a custom Python script (“https:” followed by “//github.” followed by “com/AbateLab/Pump-Control-Program”). The emulsion was collected in PCR tubes, and the oil underneath the emulsion removed via pipette and replaced with FC-40 fluorinated oil (Sigma-Aldrich, catalog no. 51142-49-5) with 5% (w/w) PEG-PFPE amphiphilic block copolymer surfactant for improved thermal stability. The emulsion was thermal cycled (Bio-Rad, T100) with the following program: 98° C. for 3 min, followed by 40 cycles with 2° C. per second ramp rates of 98° C. for 10 s, 63° C. for 20 s, and 72° C. for 20 s, followed by a hold at 12° C. Fluorescent DNA staining using 10× SYBR Green I (Thermo Fisher Scientific) in HFE-7500 oil was used to quantify barcode encapsulation rate under a fluorescent microscope (Life Technologies, catalog no. AMAFD1000).

Preparation of Cell Suspension: To prepare the artificial community for processing through the SiC-seq workflow, the frozen stock of cells (Zymo Research, catalog no. D6300) were thawed gently in a room-temperature water bath. Cell concentration was determined by manual cell counting under a microscope, and diluted to an appropriate concentration for single cell encapsulation. The calculated volume of cell solution was transferred to a 1.5 mL centrifuge tube (Fisher Scientific) and washed twice in 1 mL PBS. The cells were re-suspended in a 1 mL solution of 1 mM Tris-HCl pH 8.0 (Teknova).

MALBAC Amplification of Single Cells: MALBAC amplification mix was prepared containing 2× ThemoPol Buffer (NEB, catalog no. B9004S), 2% (v/v) Tween 20, 6% (w/v) PEG-6000, 1.2 mM dNTPs (NEB, catalog no. N04475), 0.64 μM GAT-8N primer (GTGAGTGATGGTTGAGGTAGTGTGGAGNNNNNNNN) (SEQ ID NO:1) (IDT), 4 mM MgSO₄ (NEB, catalog no. B1003S), and 0.12 U/μL Deep Vent (exo-) polymerase (NEB, catalog no. M0259S). The MALBAC amplification mix, cell suspension, and HFE-7500 fluorinated oil (3M) with 2% (w/w) PEG-PFPE amphiphilic block copolymer surfactant were loaded into separate 1 mL syringes and injected at 200, 200, 1000 μL/hr, respectively, into a co-flow droplet maker (FIG. 21). The emulsion was collected in PCR tubes, and the oil underneath the emulsion removed via pipette and replaced with FC-40 fluorinated oil with 5% (w/w) PEG-PFPE amphiphilic block copolymer surfactant for improved thermal stability. The emulsion was thermal cycled (Bio-Rad, T100) with the following program: 95° C. for 5 min, followed by 8 cycles of 20° C. for 50 s, 30° C. for 50 s, 40° C. for 45 s, 50° C. for 45 s, 65° C. for 4 mM, 95° C. for 20 s, and 58° C. for 20 s, followed by a hold at 12° C. Although this particular example did not use hydrogels, it is noted that hydrogels could have been integrated into the workflow in the place of the cell suspension.

Microfluidic Barcoding of Amplified Cells: MALBAC-amplified cell droplets, barcode droplets, and 300 μL of extension solution containing 1× ThemoPol Buffer, 1% (v/v) Tween 20, 3% (w/v) PEG-6000, 0.5 mM dNTPs, 2 mM MgSO₄, and 0.06 U/μL Deep Vent (exo-) polymerase were each loaded into a 1 mL syringe and injected into the merger device as shown in FIG. 22. HFE-7500 fluorinated oil with 2% (w/w) 008-Fluorosurfactant was used as the continuous phase of the emulsion. Merger of the barcode and cell droplet emulsions was achieved using an electrode connected to a cold cathode fluorescent inverter and DC power supply (Mastech). A voltage of 2.0 V at the power supply produced a ˜2 kV AC potential at the electrode which causes touching droplets to merge. The emulsion was collected in PCR tubes and the HFE-7500 replaced with FC-40 with 5% (w/w) 008-Fluorosurfactant prior to single-cycle PCR with the following protocol: 95° C. for 2 mins, 55° C. for 30 s, 72° C. for 5 min, and then 12° C. hold. After thermal cycling, the emulsion was broken by addition of 20 μL of perfluorooctanoyl and brief centrifugation in a micro-centrifuge. The upper aqueous phase was collected and the DNA library was purified and primers were removed using Agencourt AMPure XP beads (Beckman Coulter) at a 0.5× volume ratio of beads to PCR product. The large bead-bound DNA fragments were eluted in water.

Digestion of Single-Stranded Hairpin Loops: Single-stranded MALBAC amplicons without a barcode were digested by mung bean nuclease, an endonuclease selectively targeting ssDNA. A digestion reaction was prepared containing the barcoded PCR product, 1× mung bean nuclease reaction buffer (NEB, catalog no. M0250S), and 0.033 U/μL mung bean nuclease (NEB, catalog no. M0250S). The reaction was incubated at 30° C. for 30 min and then stopped by addition of 0.5% (w/v) sodium dodecyl sulfate (Sigma-Aldrich) to a final concentration of 0.02% (w/v). A size selection with Agencourt AMPure XP beads at a 0.5× volume ratio of beads to PCR product was performed to remove small digestion products.

Enrichment PCR of Barcoded Genomic DNA: The barcoded DNA product from the digestion reaction was amplified by PCR in a solution containing 1× Invitrogen Platinum Multiplex PCR Master Mix (Thermo Fisher Scientific, catalog no. 4464268), 0.5 μM GAT-COM primer (GTGAGTGATGGTTGAGGTAGTGTGGAG) (SEQ ID NO:2) (IDT), and 0.5 μM BD24 primer (GTATGCGACTTCAGTACATCGTCCCGATGCTCTGCACAGTCGACCGCTGA) (SEQ ID NO:12) (IDT) using the following protocol: 95° C. for 1 min, 25 cycles of 95° C. for 20 s, 80° C. for 5 s, 55° C. for 20 s (0.3° C./s ramp rate), 72° C. for 3 min, followed by a final 5 min extension at 72° C. with 12° C. hold. The PCR product was purified using a Zymo DNA Clean & Concentrator-5 kit (Zymo Research). Fragment size selection with Agencourt AMPure XP beads at a 0.5× volume ratio of beads to PCR product was performed to remove PCR primers. The DNA fragments before (FIG. 23) and after (FIG. 24) size-selection was quantified with a Bioanalyzer 2100 instrument and High Sensitivity DNA chip (Agilent). Barcoded dsDNA fragments had an average length of approximately 1500 bp.

Library Preparation and Next-Generation Sequencing: The Nextera XT DNA Library Prep Kit (Illumina, catalog no. FC-131-1024) was used to prepare a sequencing library according to the manufacturer protocol. Briefly, 600 pg of the barcoded DNA was used as input to the tagmentation reaction. After neutralization, the transposomes were amplified by 12 cycles of PCR using Nextera PCR Master Mix, 0.2 μM custom primer BD29 (AATGATACGGCGACCACCGAGATCTACACGTATGCGACTTCAGTACATCGTCCC GATGCTCTGCACAGTCGACCGCTGA (SEQ ID NO:13), containing P5 sequencing adapter) (IDT) and 0.2 μM of Illumina Nextera N706 primer (CAAGCAGAAGACGGCATACGAGATCATGCCTAGTCTCGTGGGCTCGG (SEQ ID NO:14), containing P7 sequencing adapter) (IDT). A final size selection for 300-600 bp DNA fragments was conducted using a BluePippin instrument (Sage Science). The library was quantified by Qubit 3.0 Fluorometer (Invitrogen) and Bioanalyzer 2100 instrument with High Sensitivity DNA chip. The library was sequenced on a MiSeq instrument using a 300-cycle MiSeq Reagent Nano Kit v2 (Illumina, catalog no. MS-103-1001). Custom Read 1 (CAGCGATCCCGAGTCAGATCGATGCCACTGAGACCTGTGAGTGATGGTTGAGGT AGTGTGGAG) (SEQ ID NO:15) (IDT) and Index 1 (GTATGCGACTTCAGTACATCGTCCCGATGCTCTGCACAGTCGACCGCTGA) (SEQ ID NO:12) (IDT) primers were used according to the manufacturer protocol.

Generating the SiC-Reads Database: Raw reads from the MiSeq-generated FASTQ files were filtered by quality and grouped by barcode sequence using the Python script barcodeCleanup.py. A given read was discarded if more than 20% of its bases had a Q-score less than Q20, and all reads associated with a barcode containing less than 50 reads were discarded. This step ensured that all barcode groups, representing single cells, contain a sufficient number of high-quality reads. The resulting reads were exported to a table in a SQLite database with fields containing the barcode sequence, barcode group size, a unique read ID number, and read sequence. For the synthetic cell population experiment, the reads were aligned using bowtie2 v2.2.9 with default settings and the SQLite table was updated with relevant alignment information for each read.

Results:

Library sequencing yielded 800,189 reads, 596,357 (74.5%) of which belong to a barcode group containing a minimum of 50 reads. 2,186 barcode groups (min 50 reads) were obtained. Of reads belonging to these barcode groups, 99.3% aligned to one of the ten reference genomes from the synthetic community. The purity metric for evaluating cross-contamination between barcodes was defined as follows: the purity of a barcode group is defined as the fraction of aligned reads in a barcode group which align to the most common reference genome in that group. A purity of 1.0 means that all reads in a given barcode group align to the same genome. FIG. 25 shows the distribution of purity scores for all barcode groups in the experiment (min. 50 reads per group). The groups are generally highly pure, with an average purity of 0.950. Barcodes with a greater number of reads maintain a high purity (FIG. 26). Reads within individual barcode groups align across the entire reference genomes (FIG. 27), a result of the low-bias amplification characteristics of MALBAC. 

1. A method of sequencing single cell genomic DNA, the method comprising: encapsulating a population of single cells in molten gel droplets to provide a population of molten gel droplets, wherein each molten gel droplet of the population contains zero or one cell; solidifying the population of molten gel droplets to provide a population of solidified microgel droplets; breaking the emulsions of the solidified microgel droplets to provide a population of solidified microgels; exposing the population of solidified microgels in bulk to lysis conditions sufficient to lyse cells contained within the population of solidified microgels; purifying genomic DNA from cells contained within the population of solidified microgels in bulk to provide a population of solidified microgels comprising purified genomic DNA; encapsulating the population of solidified microgels comprising purified genomic DNA into droplets to provide a population of purified genomic DNA-containing droplets; fragmenting the purified genomic DNA within the population of purified genomic DNA-containing droplets to provide a population of fragmented genomic DNA-containing droplets; barcoding the fragmented genomic DNA or an amplification product thereof in the population of fragmented genomic DNA-containing droplets to provide a population of barcoded, fragmented genomic DNA-containing droplets; purifying barcoded, fragmented genomic DNA from the barcoded, fragmented genomic DNA-containing droplets to provide purified, barcoded, fragmented genomic DNA; and sequencing the purified, barcoded, fragmented genomic DNA.
 2. The method of claim 1, wherein the barcoding comprises merging each of the fragmented genomic DNA-containing droplets with a barcode containing droplet.
 3. The method of claim 2, wherein each of the barcode containing droplets comprises a unique nucleic acid barcode sequence.
 4. The method of claim 1, wherein the method comprises incorporating an adaptor nucleic acid sequence into the fragmented genomic DNA.
 5. The method of claim 1, wherein the population of single cells comprises eukaryotic cells.
 6. The method of claim 5, wherein the population of single cells comprises mammalian cells.
 7. The method of claim 1, wherein the population of single cells comprises bacterial cells.
 8. The method of claim 1, wherein the population of single cells comprises fungal cells.
 9. The method of claim 1, wherein the molten gel droplet comprises a hydrogel polymer.
 10. The method of claim 9, wherein the hydrogel polymer comprises a thermoresponsive polymer.
 11. The method of claim 10, wherein the thermoresponsive polymer is agarose.
 12. The method of claim 1, wherein the solidifying comprises cooling the population of molten gel droplets.
 13. The method of claim 1, wherein the molten gel droplet comprises polyethylene glycol (PEG).
 14. The method of claim 13, wherein the solidifying comprises chemically crosslinking the PEG.
 15. The method of claim 13, wherein the solidifying comprises photo-crosslinking the PEG.
 16. The method of claim 1, wherein the molten gel droplet comprises acrylamide.
 17. The method of claim 16, wherein the solidifying comprises chemically crosslinking the acrylamide.
 18. The method of claim 16, wherein the solidifying comprises photo-crosslinking the acrylamide.
 19. The method of claim 1, wherein the molten gel droplet comprises alginate.
 20. The method of claim 19, wherein the solidifying comprises adding calcium to the molten gel droplet.
 21. The method of claim 1, wherein the solidified microgels comprise pores sized to retain genomic DNA within the solidified microgels.
 22. The method of claim 1, wherein the step of encapsulating the population of single cells in molten gel droplets comprises the addition of an oil.
 23. The method of claim 1, wherein the exposing comprises contacting the population of solidified microgels in bulk with a lytic enzyme to lyse cells contained within the population of solidified microgels.
 24. The method of claim 23, wherein the lytic enzyme is selected from zymolyase, lysostaphin, mutanolysin, lysozyme, or a combination thereof.
 25. The method of claim 1, wherein the step of purifying genomic DNA from cells contained within the population of solidified microgels comprises contacting the population of solidified microgels with a detergent to solubilize cellular material contained within the population of solidified microgels.
 26. The method of claim 25, wherein the detergent is selected from lithium dodecyl sulfate, sodium dodecyl sulfate, or a combination thereof.
 27. The method of claim 1, wherein the step of purifying genomic DNA from cells contained within the population of solidified microgels comprises contacting the population of solidified microgels with a protease to digest cellular proteins contained within the population of solidified microgels.
 28. The method of claim 27, wherein the protease is proteinase K.
 29. The method of claim 1, wherein the step of purifying genomic DNA from cells contained within the population of solidified microgels comprises a step of washing the population of solidified microgels, wherein the step of washing the population of solidified microgels comprises contacting the population of solidified microgels with a washing buffer.
 30. The method of claim 29, wherein each of the population of purified genomic DNA-containing droplets comprises a complex comprising a transposase and a transposon.
 31. The method of claim 30, wherein the step of fragmenting comprises contacting the purified genomic DNA with a complex comprising a transposase and a transposon.
 32. The method of claim 31, wherein the complex comprises a transposon that comprises an adapter sequence.
 33. The method of claim 32, wherein contacting the purified nucleic acids with the complex provides fragmented genomic DNA comprising the adapter sequence.
 34. The method of claim 1, wherein the step of encapsulating a population of single cells in molten gel droplets and the step of barcoding the fragmented genomic DNA or an amplification product thereof are performed using a microfluidic device.
 35. The method of claim 1, wherein one or more of the steps of solidifying the population of molten gel droplets to provide a population of solidified microgel droplets; breaking the emulsions of the solidified microgel droplets to provide a population of solidified microgels; exposing the population of solidified microgels in bulk to lysis conditions sufficient to lyse cells contained within the population of solidified microgels; purifying genomic DNA from cells contained within the population of solidified microgels in bulk to provide a population of solidified microgels comprising purified genomic DNA; and fragmenting the purified genomic DNA within the population of solidified microgels comprising purified genomic DNA in bulk to provide a population of solidified microgels comprising fragmented genomic DNA, are not performed using a microfluidic device.
 36. The method of claim 1, wherein the population of single cells is a heterogeneous population of single celled microorganisms.
 37. (canceled) 